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(54) Shared virtual space display method and apparatus using said method 



(57) A plurality of terminals are connected to a server 
via a communication network and share a predeter- 
mined common virtual space. The terminals each always 
send to the server the position coordinate of the viewing 
point and direction of eyes of its user in the virtual space, 
and the visual field image viewed from that viewing point 
is displayed on a display Based on the position coordi- 
nate and direction of eyes of the avatar each of the other 
terminals received from each of the other terminals via 

FIG. 2A 



the server, each terminal generates an avatar image in 
the specified direction and at the specified position and 
displays it in the visual field. The server is always sup- 
plied with the latest position information of the avatar 
from every terminal and, when the distance between two 
arbitrary avatars becomes smaller than a threshold 
value, connects speech channels of the two terminals 
corresponding to these avatars. 
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Description 

BACKGROUND OF THE INVENTION 

5 The present invention relates to a virtual space display method which allows user terminals connected to a com- 

munication network to freely move their avatars to desired positions in a shared virtual space and causes the user 
terminals to display images in their fields of vision In the virtual space. The invention also pertains to a virtual space 
sharing apparatus using the above-mentioned virtual space display method. 

As virtual space systems wherein a plurality of user terminals enter a shared virtual space via a communication 

10 network from their terminals connected thereto and communicate or collaborate with each other, there have been pro- 
posed, for example, a multi-user distributed, real-time multimedia conference system by Nihon IBM Co., Ltd. (Information 
Processing Society of Japan. 47th National Conference 2E-5. 1993), SIMNET by DARPA of the United States Depart- 
ment of Defense, a communication game "HABITAT* of which service is now being offered by Fujitsu LTD. on a personal 
computer communication network, and a networked virtual reality system by Nippon Electric Co.. Ltd. (Shinohara, Three 

15 Dimensional Configuration Control." Information Processing Society of Japan, Kyushu Symposium, Dec. 1991). 

tn these conventional virtual space display systems, the virtual space is displayed as a user interface of a specific 
application such as a combat simulation, electronic mail system or electronic conference system. Users are allowed to 
move their avatars in the virtual space, but since video images that the users observe on their terminal displays are 
video images captured by their avatars in the virtual space that is observed from the outside thereof, the virtual space 

20 has a defect that the users cannot fully feel a sense of real existence in the space. Moreover, when the user avatars 
meet and talk with each other in the virtual space, their voices are merely transmitted and received between them; hence, 
also from the auditory point of view, the users cannot feel totally immersed in the virtual space. Also from the visual point 
of view, the virtual space lacks a sense of real existence or reality since the avatars of the users are all displayed in the 
same size. 

25 

SUMMARY OF THE INVENTION 

A first object of the present invention is to provide a virtual space display method which gives users a sense of real 
existence in the virtual space and a virtual space sharing apparatus utilizing the method. 
30 A second object of the present Invention is to provide a virtual space display method which lends realism to the 
virtual space auditorily and/or visually and a virtual space sharing apparatus utilizing the method. 

According to its first aspect, the present invention is directed to a virtual space display method and a virtual space 
sharing apparatus for use with a virtual space system in which a plurality of user terminals connected to a communication 
network share a predetermined common virtual space and create and display visual field images which change as 
35 avatars of the users move in the virtual space. Each user terminal generates, by input control means, signals which 
respectively select and specify the position and direction of eyes of the avatar of the terminal user in the virtual space 
and produces, by visual field image producing means, a visual field image captured in the specified direction of eyes of 
the avatar from its position specified as a viewing point in the virtual space. Position information send/receive means 
sends the specified position and direction of eyes of the avatar as position information to the communication network 
40 and receives therefrom position information sent thereto from other terminals. Then, through utilization of the received 
position information, the terminal produces, by avatar image producing means, avatar images of the users of the other 
terminals in the visual field image at the positions defined by the received position information and displays, on display 
means, a combined image including the visual field image and the avatar images. 

According to a second aspect of the present invention, in the method and apparatus of the first aspect of the invention, 
45 a group of avatars which satisfy a conversation enable condition between them is searched and the terminals of the 
avatars in the same group are each supplied with voices of the other avatars mixed by common mixer means. 

According to a third aspect of the present invention, in the method and apparatus of the first aspect of the invention, 
speech data of all avatars are mixed by mixer means to produce an environment sound for supply to each avatar. 
According to a fourth aspect of the present invention, each user terminal uses the relationship between the position 
50 information of its avatar and that of the other avatars to determine the speech quality of the other avatars, then controls 
voices of the latter to have the thus determined quality, thereafter mixing them. 

According to a fifth aspect of the present invention, each user terminal uses the relationship between position infor- 
mation of its avatar and that of the other avatars to determine the image quality of the latter, then requests the other 
terminals or a server for video images of the other avatars, each having the thus determined quality, and the other 
55 terminals or server sends the requested images of the avatars to the requesting terminal after converting them into video 
images of the specified quality 
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BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 A is a diagram schematically showing an example of a distributed connection type system to which the present 
invention Is applied; 

5 Fig. 1 B is a diagram schematically showing another example of the distributed connection type system to which the 

present Irrventlon is applied: 

Fig. 2A is a diagram schematically showing an example of a centralized connection type system to which the present 
invention Is applied: 

Fig. 28 is a diagram schematically showing another example of the centralized connection type system to which 
10 the present invention Is applied; 

Fig, 3 is a block diagram Illustrating the construction of a terminal according to a first embodiment of the present 
Invention: 

Fig. 4A is a perspective view for explaining a virtual space which Is shared by terminals; 
Fig. 4B is a diagram showing a visual field image at one viewing point in the virtual space depicted in Fig. 4A: 
15 Fig. 4C is a diagram showing a visual field image at a viewing point shifted from that in Fig. 4B; 
Fig. 5 is a block diagram illustrating an example of a server in the first emtxxiiment; 

Fig. 6 is a diagram showing an example of the configuration of a message for transmission between a terminal and 
the server; 

Fig. 7 is a block diagram illustrating an example of a terminal control part of the terminal shown in Fig. 3; 
20 Fig. 8 is a table showing the configuration of data that is held in a management table memory in Fig. 7; 

Fig, 9A is a diagram showing the relationship between avatars when a conversation enable condition is the distance 
between them in a second embodiment of the present Invention: 

Fig. 9B is the relationship between avatars when the conversation enable condition Is the field of vision; 
Fig. 9C is a diagram showing another example of the relationship between avatars when the conversation enable 
25 condition Is their fields of vision; 

Fig, 9D is a diagram showing the relationship among three avatars when the conversation enable condition is the 
distance between them; 

Fig, 9E is a diagram showing the relationship among three avatars when the conversation enable condition is their 
fields of vision: 

30 Fig. 9F is a diagram showing another example of the relationship among three avatars when the conversation enable 
condition is their fields of vision; 

Fig. 10 is a block diagram illustrating an example of the construction of the server in the second embodiment; 
Fig. 11 is a block diagram illustrating the constructions of a distance deciding part and an eye contact deciding part 
in Fig. 10; 

35 Fig. 12 is a diagram showing the positional relationship between avatars, for explaining the principles of detection 
of their eye contact; 

Fig. 13 is a diagram showing the positional relationship among avatars, for explaining an environment sound; 
Fig. 14 is a diagram showing the state of channel connection in a server of a third embodiment of the present 
invention which generates an environment sound in the case of Fig. 13; 
40 Fig. 15 is a block diagram illustrating the construction of the server in the third embodiment; 

Fig, 16 is a block diagram illustrating the construction of a terminal for use in the case where the third embodiment 
is realized as a distributed connection type system; 

Fig. 1 7 is a diagram showing an example of the assignment of priorities to avatars on the basis of distance; 
Fig, 18 is a diagram showing an example of the assignment of priorities to avatars on the basis of field of vision; 
45 Fig. 1 9 is a diagram showing an example of the assignment of priorities to avatars on the basis of the direction of eyes; 
Fig. 20 is a diagram showing an example of the assignment of priorities to avatars on the basis of eye contact; 
Fig. 21 is a block diagram illustrating a server in a fourth embodiment of the present invention which controls the 
speech quality on the basis of the priorities assigned to avatars: 

Fig. 22 is a block diagram illustrating the terminal configuration in the fourth embodiment; 
50 Fig. 23 is a block diagram illustrating the terminal configuration in an embodiment of the distributed connection type 
system which controls speech quality; 

Fig. 24 is a block diagram illustrious another example of the terminal configuration in a fifth embodiment of the 
present invention which performs speech quality control on demand; 

Fig. 25 is a diagram showing an example of classifying the image quality of avatars on the basis of distance in a 
55 sixth embodiment of the present invention; 

Fig. 26 is a diagram showing a display image which is provided in the case of Fig. 25; 

Fig. 27 is a block diagram illustrating the construction of the server in the sixth embodiment; 

Fig. 28 is a block diagram illustrating the terminal configuration for use in the centralized connection system; and 

Fig. 29 is a block diagram illustrating the terminal configuration for use in the distributed connection type system. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the present invention, a plurality of terminals connected via a communication network share a virtual space and 
are allowed to freely move avatars of terminal users in the virtual space and display on their terminal displays the scenes 

5 that the avatars are observing in the virtual space. Images representing the avatars of the users (which may be symbols, 
illustrations of the users or illustrations with users' facial video images embedded therein and which will hereinafter be 
referred to simply as avatar images) are formed at the positions of the avatars in the virtual space. Accordingly, the scene 
in the virtual space that Is displayed on a display unit of each user terminal contains avatar images of other users in the 
field of vision of the avatar of each user In the virtual space. The virtual space display system of the present invention 

10 can also be designed so that the users receive predetermined services such as various entertainments, shopping and 
various pieces of information, but the system of the present invention is configured, in particular, to allow the avatar of 
each user to talk with other avatars whom it happens to meet in the virtual space. 

The system of the present invention can be designed as either of a distributed connection type system and a cen- 
tralized one. In the distributed connection type system, as shown in Fig. 1 A. a plurality of terminals 10i. IO2 and IO3 

15 connected to the thick-lined communication network such as LAN (local area network) are each adapted to form a 
common virtual space and to send and receive data to and from the other terminals as indicated by the thin-lined arrows. 
Each terminal sends data representing the position of the avatar of the user in the virtual space and data representing 
the direction of eyes of the avatar (hereinafter referred to as position information) to all the other terminals at regular 
time intervals or when the position or direction data changes. Upon receiving the position data and direction-of-eye data 

20 from other terminals, each terminal checks the data to see if the avatars of the other terminal users exist in the visual 
field of its avatar, and if so, the terminal displays the avatar images of the other terminal users at the positions specified 
by the position data received. Moreover, as explained with reference to an embodiment described later on. each user 
sends his voice or speech from his terminal to all the other terminals, and as described later in respect of another 
embodiment, if necessary, the user sends, for example, his facial video to other terminals by request. 

25 In the centralized connection type system, as depicted in Fig. 2A. the terminals IO1, IO2 and IO3 are all connected 
to a server 50 via a communication network such as LAN and perform two-way communication with the server 50 as 
indicated by the thin-lined arrows. In this instance, each terminal sends at least the position information of the avatar of 
its user to the server 50; the server 50 performs required processing on the basis of the position information received 
from each terminal and sends the processed position information to all the terminals IO1. IO2 and IO3 Fig. 2B shows 

30 the case where the terminals IO1. IO2 and IO3 are all connected to the server 50. for example, via ISDN. 

First Embodiment 

Fig. 3 schematically illustrates an example of the configuration of each terminal unit 1 0 which forms the virtual space 

35 sharing apparatus of the present invention for use in the centralized connection type system. The terminal unit 10 has 
a channel interface part 1 1 connected to a network (LAN. ISDN or the like), a terminal control device 12, a display 13. 
a control device 1 4, a speaker SP. a microphone MC and a video camera VC. 

Fig. 4A schematically illustrates the architecture of a virtual space VS provided beforehand for the terminal control 
device 12 of the terminal unit IO1 of a user U1, positions PI and P2 (given as coordinate values) of avatars A1 and A2 

40 of users in the virtual space VS and the directions of eyes (indicated by the arrows EDI and ED2) of the avatars A1 and 
A2. I^oreover, position PV indicates the position of the avatar A1 having moved thereto and the direction of eye at the 
position PV is indicated by the arrow EDV. On the other hand. Fig. 4B shows a visual field image that the avatar A1 
observes in the direction EDI from the position PI ; this visual field image is displayed on the display 13 of the terminal 
unit 1 0 1 of the user U1 . Fig. 4C shows a visual field image that the avatar A1 in Fig. 4 A observes at the position PI* after 

45 having moved thereto, the direction of its eyes being indicated by the arrow EDV. 

When the user U1 instructs, by a joystick or similar control device 14 of his terminal IO1, his avatar in the virtual 
space VS to move rightward from the position PI to the position PI ' as shown in Fig. 4A, the terminal control device 1 2 
responds to the "move" instruction to display on the display 13 the visual field image in the virtual space VS viewed from 
the new position PV (Fig. 40) in place of the visual field image from the position PI displayed until then (Fig. 4B), and 

50 the control device 12 sends the new position PV from the interface 1 1 to the server 50 via the communication network 
NW. The avatar image A1 representing the user U1 in the virtual space VS is not displayed on the display 13 of the 
terminal IO1 of his own. In this embodiment, the avatar image A2 of the other user U2 is displayed in the visual field 
image viewed from the viewing point PV (Fig. 4C). 

The server 50 has, as shown in Fig. 5. a channel interface part 51 , a connection control part 52 and a table memory 

55 53. The channel interface part 51 is connected via the communication network NW to the terminal units IO1 and IO2. 
receives therefrom the position information of their avatars, that is, the viewing points PI and P2 and directions of eyes 
EDI and ED2 of the avatars A1 and A2. transmits the position information to all terminals except the transmitting one 
and controls audio and video channel connection between the terminals specified by the connection control part 52. 
The connection control part 52 writes the received position information, that is, the coordinates of the positions (The 
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virtual space is three-dimensional and the position of each avatar is expressed by three-dimensional coordinates but 
will hereinafter be expressed by two-dimensional coordinates.) (x, y) and the directions of eyes ED in a position infor- 
mation table 53A of the table memory 53 In correspondence with the respective terminals. According to the present 
invention, when the relationship between two arbitrary avatars satisfies a predetermined condition after the updating of 

5 the data stored in the memory 53, the terminals corresponding to the two avatars are connected via the channel interface 
part 51 to enable communication or conversation between the users of these terminals. The conversation enable con- 
dition consists of. for example, the distance between the avatars and the degree of eye-to-eye contact between them 
as described later with reference to other embodiments. The connection control part 52 calculates the distance d between 
the avatars A1 and A2, for example, in the table 53 A by d ^=(x ^Xg) ^+(y ry 2) . and when d<D and the degree of eye- 

10 to-eye contact defined by the directions of eyes ED1 and ED2 of the avatars A1 and A2 satisfies a predetermined 
condition, the connection control part 52 instructs the channel interface part 51 to connect the channel between the 
terminals IO1 and IO2 corresponding to the avatars A1 and A2 and writes the state of connection (indicated by a white 
circle) of the avatars A1 and A2 in a state-of-connection table 53B. 

The channel interface part 51 relays processed audio and video data between the terminals IO1 and IO2. that is. 

75 sends the data received from the terminal IO1 to the terminal 1 02 and the data received from the latter to the former. 
The terminal control part 12 of the terminal IO1 decodes the audio data received from the terminal IO2 via the 
channel interface part 51 and outputs the sound from the speaker SP. creates the avatar image at the position specified 
by the coordinate value (X2, ya) contained In the position information received from the terminal IO2 and outputs It to the 
display 13 in combination with the visual field image in the virtual space being currently displayed. Similarly the terminal 

20 IO2 processes and outputs the audio and video data received from the terminal IO1. 

In the above, when the user of each terminal moves and/or turns his avatar in the virtual space, the position infor- 
mation from the terminal is sent as part of a "move" message MM of such a format as shown in Fig. 6. 

The "move" message MM is conposed of an avatar identifier AID, a message identifier MID, a space identifier SID, 
a coordinate value GOV. the direction of eyes ED and a state flag SFLG. The avatar identifier AID is a pre-assigned 

25 unique number representing the terminal having sent the message. The message identifier MID Is a predetermined 
number representing the message for sending position information based on the movement of the avatar. The coordinate 
value GOV and the direction of eyes ED are a position coordinate value (x. y, z) and a direction-of-eyes value y (vector) 
of the avatar in the virtual space, respectively. The state flag SFLG is a value indicating the state of the avatar (moving, 
communicating, selecting, or idle). In this case, the "selecting" state is used in a message for receiving a service, and 

30 a button value for selecting an item from a service menu by the control device 1 4 is used as the flag. The button value 
Is a value indicating which button of an input device (a mouse or joystick) is being pressed. The "Idle" indicates the state 
in which the user is not using the terminal. 

Fig. 7 illustrates in block form the configuration of the terminal control part 12 in each terminal unit 10 of Fig. 3 in a 
centralized connection type system. The terminal control part 12 comprises: a video image generating part 12G which 

35 generates a GG visual field image viewed in the specified direction of eyes from the specified coordinate position in the 
three-dimensional virtual space, for display on the display 13; a control input processing part 12D which receives the 
input from the control device 14 and processes it for conversion to the coordinate value and the button value; a commu- 
nication interface 12A which performs processing for transmission and reception to and from the communication network 
NW; a file unit 12F which stores display data, virtual space image data, software and user data; a management table 

40 memory 12E; an audio output processing part 12J which receives audio data and provides an analog speech signal to 
the microphone MC; an audio/video input part 12K which performs digital processing of input speech from the micro- 
phone MC and a video signal from the video camera VG and provides them to the server 50; and a CPU 12G which 
controls the operation of the terminal control part 12. These components are interconnected via a bus 12B. 

In the management table memory 1 2E. as shown in Fig. 8. there are stored the position coordinate GOV and direction 

45 of eyes y of the user's avatar inputted from the control input processing part 1 2D through the manipulation of the control 
device 14. the position coordinates GOV and direction of eyes y of other avatars, change flags GFLG and state flags 
SFLG received from the server 50 (or other terminals); these pieces of Information are stored in correspondence with 
the respective avatar identifiers AID The avatar identifier AID, the state flag SFLG, the coordinate value GOV and the 
direction of eyes ED are the same as those used in the "move" message depicted in Fig. 6. When these pieces of data 

50 on avatars are updated, the change flag GFLG is set to "1." Now, a description will be given of the operation of the 
terminal control part 12 in the terminal IO1 of the user U1. for instance. The GPU 12G reads out of the management 
table memory 12E the position (x.yz) and direction of eyes yi corresponding to the identifier AID-i of the avatar A1 of 
the user U1 , instructs the video image generating part 12G to generate the visual field image observed in the direction 
of eyes yi from the position (x.y.z) in the virtual space stored as data in the file unit 12F, detects other avatars present 

55 in that field of vision from their coordinate values stored in the table memory 1 2E. generates avatar images at the positions 
of the detected avatars and instructs the display 13 to display thereon the avatar Images In combination with the visual 
field image. The avatar images that are displayed on the display 13 are, for example, video images of users' faces 
received from the server 50 and produced in sizes corresponding to the distances from the avatar A1 of the user U1 to 
the other avatars to be displayed. 
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The CPU 12C always monitors the change flag CFLG in the management table memory 12E and, upon detecting 
a change in the data stored corresponding to the avatar A1 of the user U1 (CFLG=1 ), instructs the video image generating 
part 12G to separately generate the visual field image in the virtual field to be displayed and the other avatar images to 
be contained therein and displays them on the display 1 3, thereafter resetting the change flag CFLG. When it is detected 

5 that the change flag of another avatar is "1 only its avatar image is updated on the basis of the updated coordinate 
position GOV and direction of eyes y, after which the change flag CFLG is reset. 

While this embodiment has been described in respect of the centralized connection type system, the present inven- 
tion is also applicable to the distributed connection type system. In such an instance, each terminal sends to ail the other 
terminals the "move" message MM of the format of Fig. 6 which contains the position information on the avatar of said 

10 each terminal and, at the same time, writes the message into the management table memory 1 2E of the terminal control 
part 12 shown in' Fig. 7. On the other hand, each terminal writes into the management table memory 12E the "move" 
messages MM of the same format received from the other terminals and. at the same time, forms and displays other 
avatar Images, which are observed in the field of vision from the position of its user's avatar, at their specified positions 
in the visual field image. To implement the distributed connection type system, the terminal control part 12 of each 

IS terminal needs only to incorporate therein between the communication interface 12A and the bus 12B a part that cor- 
responds to the channel interface part 51 of the server 50 in the centralized connection type system shown in Fig. 5. 
The functions of the connection control part 52 and the table memory 53 in Fig. 5 can be implemented by the CPU 12 
and the management table memory 53 in Fig. 7. respectively. 

20 Second Embodiment 

in the above, a brief description has been given of the case of connecting the audio channel between two terminals 
when the distance d between avatars of their users in the virtual space is smaller than a predetermined value: a descrip- 
tion will be given of the conditions that enable conversation between such avatars and an embodiment of an apparatus 
25 which connects the speech channel between them on the basis of such conditions. The conditions are the distance 
between two avatars and their visual angles and directions of eyes. 

(a) When the distance i between the avatars A1 and A1 given their position coordinates is equal to or smaller than 
a predetermined value Da as shown in Fig. 9A, the server 50 interconnects the terminals 1 0i and 1 02 corresponding 

30 to the avatars A1 and A2. enabling transmission and reception of speech between them. In the system like this, the 
direction of eyes is not taken into account. 

(b) When the distance d between the avatars A1 and A2 is equal to or smaller than a predetermined value Db and 
at least one of the avatars is in the field of vision of the other avatar as shown in Fig. 9B. the server 50 interconnects 
the two corresponding terminals, enabling transmission and reception of speech between them. The visual angle 

35 a is a predetermined value. In the example Fig. 9B. the avatar A1 of the user U1 is not displayed on the display unit 
1 3 of the terminal 1 02 of the user U2 but the avatar A2 of the user U2 is displayed on the terminal display unit 1 3 of 
the user U1 ; hence, the avatar A1 can start conversation with the avatar A2 by calling to it. 

(c) When the distance d between the avatars A1 and A2 is equal to or smaller than a predetermined value Dc and 
either of them is in the field of vision of the other as shown in Fig. 9C. the server interconnects the two corresponding 

40 terminals, enabling transmission and reception of speech between them. Incidentally, when the avatars of two ter- 
minal users are each in the field of vision of the other, it is assumed that tiiey establish eye-to-eye contact. 

(d) In such a situation as shown in Fig. 9D wherein a third avatar A3 approaches one (A1 . for example) of the avatars 
A1 and A2 engaged in conversation with each other in a system utilizing the above-mentioned condition (a) and a 
conversation enable condition (d^Dd) is also satisfied between the avatars A1 and A3 as shown Fig. 9D. voices of 

45 the avatars A1 and A2 are sent to the terminal IO3 of the avatar A3 after being mixed, voices of the avatars A1 and 
A3 are sent to the terminal IO2 of the avatar A2 after being mixed and voices of the avatars A2 and A3 are sent to 
the terminal IO1 of the avatar A1 after being mixed so as to enable conversation among the avatars Al . A2 and A3 
of the three terminal users. 

(e) Also in such a situation as shown in Fig. 9E wherein the third avatar A3 approaches one (Al) of the two avatars 
50 Al and A2 engaged in conversation with each other in a system utilizing the above-mentioned condition (b). the 

same processing as the above-described (d) may be performed. 

(f) Also in such a situation as shown in Fig. 9F wherein the third avatar A3 enters the field of vision of one (A2) of 
the avatars Al and A2 engaged in conversation with each otiier in a system utilizing the above-mentioned condition 
(c), the same processing as the above-described (d) may be performed. 

55 (g) Alternatively, it is possible to use the above-mentioned condition (c) as the conversation enable condition for the 
first two avatars Al and A2 and a predetermined one of tiie conditions (a) and (b) as a condition for the third and 
subsequent avatars to join the conversation between the avatars Al and A2. 
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Fig. 1 0 illustrates an example of the server of the virtual space sharing apparatus for use in the centralized connection 
type system which performs the processing (f). This example is shown to have three terminals. For the sake of brevity, 
no video-related parts are shown. 

The server 50 swap speech data and position information with the terminals via channels CHi. CH2 and CH3 At 

5 first, the data received via the channels CHi, CH2 and CH3 are received In channel interface parts INF1. INF2 and INF3. 
respectively. The channel interface parts INF1 to INF3 analyze the received data and, in the case of speech data, transfer 
it to a speech switching part 52S. When the received data is position information containing position data and direction- 
of-eyes data, the channel interface parts INF1 to INF3 write the position data and the direction-of-eyes data in the table 
memory 53 and. at the same time, transfer them to a position information distributing part 52A. The position information 

10 distributing part 52A copies the position information received from the channel interface part INF1 and transfers it to the 
channel interface parts INF2 and INF3 and copies the position Information received from the channel interface part INF2 
and transfers it to the channel interface parts INF1 and INF3. Furthermore, the position information distributing part 52A 
copies the position information and direction-of-eyes information received from the channel interface part INF3 and 
transfers them to the channel interface parts INF-i and INF2, A distance decision part 52B reads out the position infor- 

75 mation from the table memory 53 and calculates alt distances dy (ij:1 .2.3. t^^^j) between avatars Aj and A| The distance 
decision part 52B compares each distance dy with a predetermined threshold value D and sets the distance di| to a value 
1 or 0. depending on whether 0<djj^D or dij>0. and transfers the value to a mixing object determining part 52D. An eye 
contact decision part 52C uses the position data and the direction-of-eyes data to calculate a value Wjj which indicates 
whether either of avatars are in the field of vision of the other That is to say. the eye contact decision part 52C sets the 

20 value Wjj to "1 " or "0." depending on whether or not the viewing points (the positions of avatars) of two users Ui and Uj 
are each in the field of vision of the other, and transfers the value Wy to a mixing object determining part 52D. The mixing 
object determining part 52D calculates the product, Pij=dijWij , of the values dy and wy and instructs a switch 52S to 
connect the speech of the user Ui. for which the above-noted product is "1 to the channel CHj of the user Uj and the 
speech of the user Uj to the channel CHi of the user Ui. 

25 Now, a description will be given, with reference to Figs. 1 1 and 12, of the principles of decision in the distance 
decision part 52B and the eye contact deciding part 52C. As shown in Fig. II. the distance decision part 528 comprises 
a distance calculating part 52B1 for calculating the distance between two avatars and a comparison part 52B2 for making 
a check to see if the calculated distance is within the threshold value D. The eye contact deciding part 52C comprises 
direction-of-avatar calculating parts 52C1 and 52C3 each of which calculates the direction of one of two avatars from 

30 the other, comparison parts 52C2 and 52C3 which compare calculated directions Oj and 0j with a predetermined visual- 
field threshold value a to determine if either of the avatars are in the field of vision of the other, and a coincidence deciding 
logical operation part52C5 which uses the two results of comparison to determine if the two avatars establish eye-to- 
eye contact. 

As shown in Fig. 12, a coordinate axis is set in the virtual space VS; let the coordinates of the position Pj of the 
35 avatar Aj be (Xj.yj) and the coordinates of the position pj of the avatar Aj be (Xj.yj). Furthermore, let the direction-of-eyes 
vector i of the avatar Aj be (i^ Jy) and the direction-of-eyes vector j of the avatar Aj be OxJy)- Incidentally, the direction-of- 
eyes vector is a unit vector. 

The distance between the avatars Aj and Aj can be calculated by the following equation on the basis of the position 
coordinates (Xj.yj) and (Xj.yj) inputted into the distance calculating part 52B1 . 

40 

dir{(Xj-Xi)'+(yj-yij')^^ (1) 

The distance dy is compared with the predetermined threshold value D, and as referred to previously, the distance djj is 
set to a value "V or "O;* depending on whether 0<dij^D or dij<D. The distance value d^ thus set is transferred to the 
45 mixing object determining part 52D. 

The coincidence deciding logical operation part 52C calculates a value w which indicates whether the fields of vision 
of users coincide with each other, on the basis of the position coordinates (Xj.yj) and (xj.yj) and the direction-of-eyes 
vectors (ix.iy) and GxJy) inputted into the direction-of-avatar calculating parts 52C1 and 52C3. 

COSej can be determined by calculating the inner product of the vector I and the vector Pjj from the coordinate P; to P;. 

50 ^ ^ 

'•Pij=|i|-!Pij|COSej=i^(Xj-Xj)+iy(yj-yi) 

where |l| is a unit vector = 1 and |Pjj| is the distance djj; between positions Pj and Pj which is expressed by Eq. (1). 
Therefore, the direction of existence of the avatar Aj viewed from the avatar Aj can be calculated by the following 
55 equation: 

cose j={i ,(x j-x j)+i(y .-y j)}/{(Xj-x ,) 2+(y .-y ,) ^] ^(x^x ,)+i y(y j-y ,)}/d (2) 

This calculation can also be conducted using the distance djj calculated in the distance calculating part 52B1 . 
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The direction 6; thus calculated by the direction-of-avatar calculating part 52C1 is compared with the predetermined 
visual-field threshold value a in the comparison part 52C2. and it is set to a value or "0." depending on whether 
0<ei^a or a<0j. The thus set value e^ is inputted into the coincidence deciding logical operation part 52C5. 

Similarly, the direction of existence Oj of the avatar Aj viewed from the avatar Aj can be determined by the following 
5 equation: 

COSej={i,(Xi-xp+jy(ypyp}/{(x^.Xi)^(yj.y,)2}^/2^{j^(x,.xp+jy(ypyj^ (3) 

The direction e^ thus calculated In the calculating part 52C3 is compared with the predetermined visual-field threshold 
10 value a. and it is set to a value "1 '* or "0." depending on whether o<ej^a or a^ej. The thus set value is inputted into 
the coincidence deciding logical operation part 52C5. Fig. 12 shows that the avatar Aj is in the field of vision a of the 
avatar Aj. whereas the avatar Aj is out of the field of vision a of the avatar Aj and hence is not recognized. A preferable 
value of the visual angle a Is 45 degrees, for instance. 

The direction e^ of the avatar Aj viewed from the avatar Aj and the direction Sj of the avatar Aj viewed from the avatar 
15 Aj. outputted from the comparison parts 52C2 and 52C4, respectively, are inputted into the coincidence deciding logical 
operation part 52C5, wherein a logical product Wij=e j-Sj is operated. Thus, the logical operation part 52C5 outputs a 
value Wjj=1 which expresses the coincidence of the fields of vision of the users only when 6 j-e j=1 indicating either of 
the avatars is in the field of vision of the other. When either one of the value and ej is "0," the logical operation part 
52C5 outputs a value Wjj=0. The output wy from the logical operation part 52C5 is transferred to the mixing object deter- 
20 mining part 52D. The mixing object determining part 52D uses the set value dy from the distance decision part 52B and 
the set value wy from the eye contact deciding part 52C to calculate P jj=d y-w ,j as referred to previously and provides 
it to the switching part 52S. 

The switching part 52S responds to the instruction of the mixing object determining part 52D selects from voices 
received from the channel interface parts INF2 and INF3 those voices which satisfy a condition P23=1, that is, those 

25 voices which are to be connected to the channel CHi accommodated in the channel interface part INF1: the voices thus 
selected are mixed by a mixer 52Mi and the mixed output is provided to the channel interface part INF1. Of voices 
received from the channel interface parts INF1 and INF3 those voices which satisfy a condition pi3=1. that is, those 
voices which are to be connected to the channel CH2 accommodated in the channel interface part INF2, are selected 
and mixed by a mixer 52M2, thereafter being transferred to the channel interface part INF2. Similarly, of voices received 

30 from the channel interface parts INF1 and INF2, those voices which satisfy a condition Pi2=1 . that is, those voices which 
are to be connected to the channel CH3 accommodated in the channel interface part INF3, are selected and mixed by 
a mixer 52M3. thereafter being transferred to the channel interface part INF3. 

The channel interface parts INF^, INF2and INF3 provide on the channels CHi. CH2 and CH3 the position information 
containing the position data and the direction-of-eyes data, received from the position information distributing part 52, 

35 and the speech data received from the mixers 52M-t, 52M2 and 52M3. 

In the case of a system which implements the aforementioned conversation enable conditions (a) and (d), the eye 
contact deciding part 52C in Fig. 1 0 need not be provided and the mixing object determining part 52D controls the switch 
52S on the t>asis of the distance dy alone. The conversation enable conditions (b) and (e) can be implemented by ORing, 
Wjj=ei+ej. . in the coincidence logical operation part 52C5 in Fig. 11. 

40 

Third Embodiment 

In the above embodiments, even if the number of avatars engaged in conversation is three or more, they can each 
hear voices of all the other avatars in that group but cannot hear voices of an avatar who stays out of the group. This 

45 will be described in respect of such a party as shown in Fig. 13 wherein there attend in the virtual space VC seven 
avatars A1 to A7 corresponding to users of seven terminals IO1 to IO7. As shown, the users of the avatars A1 and A2 
are talking with each other and the users of the avatars A3 to A5 are also talking with one another, but the users of the 
avatars A6 and A7 are not engaged in the conversation of either group. If the users of the avatars A6 and A7 could hear 
voices of the both groups as environment sounds, they would feel the existence of the other avatars in the virtual space 

50 VC like in the real world. Similarly, if the users of the avatars A1 and A2 engaged in conversation could also experience 
of enhanced realism of the virtual space VC if they could hear, as environment sounds, the voices of the avatars A3 to 
A5 and sounds made by the avatars A6 and A7. 

Now, a description will be given of an embodiment of the virtual space sharing apparatus which allows all users in 
a shared virtual space to hear sounds made and voices uttered by them as environment sounds through dynamic switch- 

55 ing of the setting of a speech path switch. 

Fig. 14 shows how the speech path switch 52S and the mixer 52M in the server 50 of the apparatus of this embod- 
iment are interconnected in the case of Fig. 13. Let it be assumed that the terminals IO1 to IO7 of the seven users are 
present in the same virtual space and that the group of two users corresponding to the terminals IO1 and IO2 and the 
group of three users corresponding to the terminals IO3. IO4 and IO5 are engaged in conversation independently of 
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each other. In this embodiment, the mixer 52 is adaptively divided into mixing parts 52Ma and 52Mb corresponding to 
the two conversation groups, respectively, and a mixing part 52Mc for all the avatars in the virtual space VC. 

The switch 52S has a construction which one-way connects sounds and voices received from all the terminals 10i 
to IO7 to the mixing part 52Mc. A sound Sc, produced by mixing the sounds and speech data Di to D7 thus one-way 

5 transmitted from all the terminals IO1 to IO7. is attenuated by a loss inserting part 5Lc down to a level appropriate for 
an environment sound and transmitted to the terminals lOe and IO7 of the users who are not engaged in conversation. 
In this way the users of the terminals 1 0e and 1 07 can hear, as environment sounds, the sounds made and voices uttered 
by all the users present in the shared virtual space. 

On the other hand, the switch 52S two-way connects the speech data SDi and SD2 sent from the terminals IO1 and 

10 IO2 to the mixing part 52Ma and. at the same time, rt one-way connects the mixed sound Sc. inputted from the mixing 
part 52Mc via the loss inserting part 5Lc. to a loss inserting part 5La to attenuate to such a sound pressure level as not 
to hinder conversation, after which the mixed sound Sq is provided to the mixing part 52Ma. The mixing part 52Ma mixes 
the speech data SD2 from the terminal IO2 and the environment sound Sc and sends the mixed sound to the terminal 
IO1 via the switch 52S; furthermore, the mixing part 52Ma mixes the speech data SD-i from the terminal IO1 and the 

15 environment sound Sc and sends the mixed sound to the terminal 1 02 via the switch 52S. Thus, the users of the terminals 

101 and IO2 are capable of hearing the environment sound Sc of the reduced sound pressure level while at the same 
time carrying on a two-way conversation as in the case of talking to each other over the telephone. As regards the group 
of the terminals 1 03 to IO5. too. the output from the mixing part 52Mc is similarly connected to the mixing part 52mb via 
the loss inserting parts 5Lc and 5Lb, and the mixing part 52Mb generates speech data to be sent to each terminal by 

20 mixing speech data from all the other terminals and the environment sound Sc and sends it to the terminals via the 
switch 25S. enabling the users of the terminals to hear the environment sound Sc of the lowered sound pressure level 
while carrying on two-way conversation. 

Turning now to Fig. 15, the server 50, which is provided with the switch 52S and the mixer 52M shown in Fig. 14. 
will be further described. Let It be assumed, for the sake of brevity, that the number of terminals is three and that the 

25 users of the terminals IO1 and IO2 are talking with each other, leaving the user of the terminal IO3 alone. In Fig. 14 the 
interface INF and the switch 52S are two-way connected, but in Fig. 15 the channel interface parts INF1 to INF3 and the 
switch 52S are shown to be one-way connected with a view to showing the kinds of speech data that are transmitted 
and received between them. In Fig. 15. the virtual space and respective terminals transmit audio data and position 
information data via an advanced information system INS network and the channels CHi to CH3 in the server 50. At 

30 first, pieces of data received via the channels CHi to CH3 are received in the channel interface parts INF1 to INF3, 
respectively The channel interface part INF1 analyzes the received data and. if it is speech data SDi, transfers it to the 
switch 52S. Likewise, the channel interface parts INF2 and INF3 analyze the received data and. if they are speech data 
SD2 and SD3, transfer them to the switch 52S. 

When the received data is position data and direction-of-eyes data, the channel interface parts INF1 to INF3 transfer 

35 these pieces of data to the position information distributing part 52A and write them into the table memory 53. The 
position information distributing part 52A copies the position data and direction data received from the channel interface 
part INF1 and transfers them to the channel interface parts INF2 and INF3. Similarly, the position information distributing 
part 52A copies the position data and direction data received from the channel interface part INF2 and transfers them 
to the channel interface parts INF1 and INF3 and copies the position data and direction data received from the channel 

40 interface part INF3 and transfers them to the channel interface parts INF1 and INF2. 

A conversation monitoring part 52D discriminates a group of avatars that satisfies the afore-mentioned predeter- 
mined conversation enable conditions on the basis of the position data and direction-of-eyes data read out of the table 
memory 53 and defines or specifies in the mixer 52M the mixing part 52Ma which mixes speech data from the terminals 
corresponding to the avatars of the group and the mixing part 52Mb which generates an environment sound from speech 

45 data from the terminals corresponding to all avatars in the virtual space. The conversation monitoring part 52D controls 
the switch 52S to supply the mixing part 52Ma with the speech data SDi and SD2 received from the terminals 1 01 and 

102 corresponding to the avatars of the discriminated group and the mixing part 52Mb with the speech data SDi, SD2 
and SD3 from all the avatars. Thus, the switch 52S transfers the speech data SDi to SD3 received from the channel 
interface parts INF1 to INF3 to the mixing part 52Mb. The mixing part 52MBmixes the speech data SDi to SD3 and 

50 transfers the mixed sound Sb as an environment sound to the switch 52S via a loss inserting part 5Lb. The switch 52S 
sends the environment sound Sq to the channel interface part INF3 corresponding to the terminal IO3 of the user not 
engaged in conversation and. at the same time, provides the sound Sb via a loss inserting part 5La to the mixing part 
52Ma. The mixing part 52Ma mixes the sound Sb with the speech data SDi and SD2 from the channel interface parts 
INF1 and INF3. respectively and sends the mixed sounds SDi+SDb and SD2+SDB to the channel interface parts INF2 

55 and INF1. from which they are sent to the terminals IO2 and 10^ . respectively 

As the conversation enable condition for the conversation monitoring part 52D to identify the avatars of the conver- 
sation group, it is possible to use the afore-mentioned conditions such as the distance between the avatars of the users, 
their mutual existence in the field of vision of the other, or a combination thereof. When the avatars of the conversation 
group end the conversation and enter a state in which the conversation enable condition is not satisfied, the conversation 
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monitoring part 52D cut off the paths from the channel interface parts INF1 and INF2 to the mixing part 52Ma and controls 
the switch 52S to send the environment sound 83 from the mixing part 52Mb to the channel interface parts INF1 to INF3 
via the loss inserting part 5Lq. 

The Fig. 15 embodiment has been described as being applied to the centralized connection type system; in the 
5 case of the distributed connection type system, as depicted in Fig. 16 (wherein no video-related parts are shown), 
position information of avatars received from respective terminals is written into a table memory 12E. A conversation 
monitoring part 12T controls a switch 12W to supply a mixing part 2Ma with voice data received from the terminals 
con-esponding to other avatars detected from their position information read out of the table memory 1 2E. By this, mixed 
voice data of the voice data of all the avatars is obtained from the mixing part 2Ma. and the mixed voice data is provided 
10 to a loss inserting part 2L, wherein a predetermined loss is introduced thereinto to generate the environment sound Sb, 
which is provided to a mixing part 2Mb. On the other hand, the conversation monitoring part 12T detects other avatars 
which satisfy the condition for conversation direction or indirectly with the avatar of the terminal concerned on the basis 
of the position information of other avatars and the position information of the avatar concerned set by the control device 
1 4 and controls the switch 1 2 W to supply the mixing part 2Mb with voice data received from the terminals corresponding 
15 to the above-mentioned other avatars satisfying the conversation enable conditions. As the result of this, the mixing part 
2Mb mixes the voices of the other avatars engaged in conversation with the avatar of the terminal user concerned, 
together with the environment sound Sb, and the mixed output is provided to the speaker SR 

As described above, the virtual space sharing apparatus of this embodiment lends more realism to the virtual space 
by supplying the environment sound to all avatars regardless of whether they are engaged in conversation or not. 

20 

Fourth Embodiment 

With the apparatus of the above embodiment, it is possible to enhance the realism of the virtual space by feeding 
the environment sound to all avatars in the virtual space, but since the voices of other avatars contained in the environ- 

25 ment sound have the same level, each avatar cannot feel a sense of distance with respect to the other avatars. Besides, 
mixing of voices from all terminals poses a noise problem when the number of terminals is large. The same problems 
also arise in the same conversation group, since the voices of other avatars are of the same level. Now. a description 
will be given of an embodiment of the virtual space sharing apparatus adapted to dynamically change the quality of 
voices to be mixed on the basis of the position information of individual avatars, 

30 In this embodiment, the voices of other avatars to be mixed for each avatar are graded or classified into some levels 
of quality on the basis of such information as listed below 

(a) The position information of the avatar of each user is used to grade the voice of another avatar according to the 

length of a straight line joining the position coordinates of the both users. 
35 (b) The position information and direction-of-eyes information of the avatar of each user are used to grade the voice 

of another avatar, depending on whether another user is in the field of vision of the user concerned. 

(c) The position information and direction-of-eyes information of the avatar of each user are used to grade the voice 

of another user according to the angle between a straight line joining the position coordinates of the both users and 

the direction of eyes of the user concerned. 
40 (d) The position information and direction-of-eyes information of the avatar of each user are used to turn the directions 

of eyes of the user and another user to a straight line joining their position coordinates to grade the voice of another 

user according to the sum of both angles of rotation. 

(e) Some of the conditions (a) to (d) are combined to grade the voices of the users. 

45 Figs. 17 to 20 are bird's-eye views of virtual spaces, showing examples of the grading of voices into some levels of 
quality in accordance with the relationship between the avatar Al and the other avatars. For the sake of brevity, this 
embodiment will be described in connection with the case of classifying the voices of the other avatars in terms of sound 
pressure level. 

In the example of Fig. 17. concentric circles are drawn about the avatar Al and the voices of avatars in circles of 
50 smaller radii are graded up to higher levels of quality. This example uses five levels of quality. That is, the voice of the 
avatar A2 closest to the avatar Al is graded up to the highest leveI(loss rate: 0 dB) and the voice of the avatar A3 is 
graded to the second highest level (loss rate: -10 dB). The voices of the avatars A4 and A5 (loss rate: -13 dB). the voice 
of the avatar A6 (loss rate: -16 dB) and the voices of the avatars A7 and A8 (loss rate: -19 dB) are thus graded down in 
this order. This processing is carried out for each of all the remaining avatars in the virtual space. While this example 
55 employs the simplest grading method which uses concentric circles, various other method can be used. For example, 
the voice of an avatar in front of the noted one Al is graded to a higher level than the voice of an avatar behind through 
utilization of human hearing or auditory characteristics. 

In the example of Fig. 1 8. the field of vision of the avatar Al is calculated from the direction of eyes thereof and the 
voices of avatars in that field of vision are preferentially graded up to higher levels. This example employs two levels of 
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quality. That is, the sound pressure levels of the voices of the avatars A2, A4 and A7 are Increased (loss rate: 0 dB). 
whereas the sound pressure levels of the voices of the avatars A3. A5, A6 and A8 not in the field of vision are decreased 
(loss rate: -19 dB), This processing is carried out for each of the other remaining avatars in the virtual space. In this 
case, the visual angle of each avatar is predetermined in the system. 

5 The example of Fig. 1 9 employs the angle 6 between the direction of eyes or line of sight EL of the avatar A1 and 

a straight line joining the avatar A1 and each of the other avatars. The voices from the avatars of smaller angle are 
preferentially graded up to higher levels of quality. This example uses five levels of quality. That is. the sound pressure 
level of the voice of the avatar A4 (6=0) on the line of sight EL of the avatar A1 is increased (loss rate: 0 dB); the voices 
of the avatars A7 and A5 with -45'*^es90'* are set to a loss rate of -10 dB; the voices of the avatars A3 and A6 with - 

10 90°ge<-45° or 45'*<9^90*' are set to a loss rate of -13 dB; the voices of the avatars A9 and A2 with -135'^e<-90** or 
9OV0si35° are set to a loss rate of -16 dB; and the voice of the avatar with -180*^e<-135*' or 135''<0^18O** are set to 
a loss rate of -19 dB. This processing is carried out for each of all the other remaining avatars. 

In the example of Fig. 20. the avatar A1 and each of the other avatars are joined by a straight line as indicated by 
the broken line and the line of sight of the avatar A1 is turned until it comes into alignment with the straight line and the 

75 turning angle a Is calculated. The direction of rotation in this case is the direction in which the angle a decreases. Similarly, 
the line of sight of the other avatar is turned until it comes into alignment with the straight line and the turning angle p is 
calculated. The direction of turn in this case is the direction in which the angle p decreases. Then, the sum of the both 
turning angles, a+p=0 , is calculated. The voices of the avatars of the smaller angles are graded up to higher levels of 
quality. This example uses five levels of quality. That is, the sound pressure level for the avatar A4 to which the line of 

20 sight of the avatar A1 conforms (6=0°) is increased, whereas the loss rates for the avatar A3 with 0°<e^45^ the avatar 
A5 with 45Ve^90^ the avatar A6 with 90°<e<135'* and the avatar A2 with ISSVe^isO** are set to -10 dB. -13 dB. -16 
dB and -19 dB. respectively. This processing is carried out for each of all the other avatars in the virtual space. 

The methods of determining the loss rate as described above in respect of Figs. 1 7 to 20 may be used singly or in 
combination. With the combined use of the methods of Figs. 17.18 and 20. for instance, it is possible to make the voice 

25 of the avatar in the field of vision of the avatar A1 larger as the distance between them decreases and as the degree of 
coincidence of their directions of eyes increases. 

Fig. 21 illustrates an example of the configuration of the server 50 which effects the above-described voice quality 
control in the centralized connection type virtual space sharing apparatus . For the sake of simplicity, the server 50 is 
shown to accommodate three terminals. The server 50 is connected to terminals 10i, IO2 and IO3 (see Rgs. 2A and 

30 2B) via the channels CHi, CH2 and CH3 and receives data therefrom in the channel interface parts INF-i, INF2 and INF3. 
respectively. 

When the received data is position data and direction-of-eye data, the channel interface parts INF-, to INF3 transfer 
them to the position information distributing part 52A and, at the same time, write them into the table memory 53.' 
As in the case of the Fig. 1 0 entiodiment. the position information distributing part 52A copies the position data and 

35 direction-of-eyes data received from the channel interface part INF^ and transfers them to the channel interface parts 
INF2 and INF3, copies and transfers the postion data and direction-of-eyes data received from the channel interfacepart 
INF2 to those INF1 and INF3, and copies and transfers the position data and direction-of-eyes data received from the 
channel interface part INF3 to those INF1 and INFg 

A loss determining part 52Ei uses the position data and direction-of-eyes data read out of the table memory 53 to 

40 calculate, by the methods described previously with reference to Figs. 17 to 20, the loss rates of voices of other users 
to be provided to the user of the terminal accommodated in the channel Interface part INF1. Based on the loss rates 
thus determined, the loss determining part 52Ei sends loss-inserting instructions to loss inserting parts 5Li2 and 5L13 
corresponding to the users of the terminals accommodated in the channel interface parts INF2 and INF3. Similarly, a 
loss determining part 52E2 also sends loss-inserting instructions to loss inserting parts 5L21 and 5L23 corresponding to 

45 the users of the terminals accommodated in the channel interface parts INF1 and INF3. Also a loss insertion determining 
part 52E3 similarly sends loss-inserting instructions to loss inserting parts 5L31 and 5L32 corresponding to the users of 
the terminals accommodated in the channel interface parts INF1 and INF2. 

The channel interface part INF1 analyses received data and, if it is speech data, transfers the speech data SDi to 
the loss inserting parts 5L21 and 5L31. Likewise, the channel interface part INF2 analyses received data and. if it is 

50 speech data, transfers the speech data SD2 to the loss inserting parts 5Li2 and 5L32. Also the channel interface part 
INF3 similarly analyses received data and. if it is speech data, then transfers the speech data SD3 to the loss inserting 
parts 5Li3 and 5L23- By this, the abovementioned loss is inserted in the speech data fed to each loss inserting part. 

A speech mixing part 52Mi with the losses inserted therein by the loss inserting parts 5Li2 and 5L13 and transfers 
the mixed output to the channel interface part INF1, from which it is sent to the terminal IO1 via the channel CHi. A 

55 speech mixing part 52M2 mixes the speech data with the losses inserted therein by the loss inserting parts 5L21 and 
5L23 and transfers the mixed output to the channel interface part INF2. from which it is sent to the terminal IO2 via the 
channel CH2. Similarly, a speech mixing part 52M3 also mixes speech data with losses inserted therein by the loss 
inserting parts 5L31 and 5L32 and transfers the mixed output to the channel interface INF3, from which it is sent to the 
terminal IO3 via the channel CH3. 
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Fig. 22 illustrates in block form an example of one terminal 10 which is connected to the server 50 shown In Fig. 
21. The parts corresponding to those in Rgs. 3 and 7 are identified by the same reference numerals and characters. 

The channel interface part 12A of the terminal control part 12 is connected via the communication network NW to 
the server 50 depicted In Rg. 21. The control input processing part 12D sends position data and direction-of-eyes data 

5 of the user's avatar inputted from the control device 14 to the server 50 via the channel interface part 12A and the 
communication network NW and. at the same time, sends the same data to a viewing point shift processing part 12Gv 
of the video image generating part 12G. 

The viewing pint shift processing part 12Gv of the video image generating part 12G uses the position data and 
direction-of-eyes data of the avatar received from the control input processing part 12D to shift the viewing point in the 

10 virtual space and display on a display 13 video Images that come into the field of vision. An other avatar shift processing 
part 12Gm forms avatar images of other users at specified positions and in specified directions in the visual field Image 
In correspondence with position data and directlon-of-eyes data of the other users* avatars received from the server 50 
via the channel interface part 12A and displays them on the display 13. 

The voice received In the channel interface part 12A Is outputted to the speaker SR The voice of the user of this 

75 terminal, inputted from the microphone MC, is sent via the channel interface part 12A to the server 50. 

While Figs. 21 and 22 show examples of the constructions of the server and the terminal for use in the centralized 
connection type virtual space display apparatus, the same principles described above can also be applied to the dis- 
tributed connection type virtual space sharing apparatus. 

Fig. 23 illustrates an example of the configuration of one terminal 1 0 for use in the distributed connection type virtual 

20 space sharing apparatus which effects the afore-mentioned speech quality control. In this example, the number of ter- 
minals of other users is three. In the centralized connection type system, the terminal 10 of Rg. 22 sends and receives 
position data, direction-of-eyes data and speech data to and from the server 50 of Fig. 21 and voices are mixed in the 
server 50 in correspondence with respective users. In contrast thereto, in the distributed connection type system of Fig. 
23, a speech quality of control part 12Q is provided in the terminal control part 12 of each user terminal and, based on 

25 the position data and/or direction-of-eyes data received from the other terminals and stored in the table memory 12E, 
the sound pressure level for each of the other users' avatars is determined in a loss determining part 2E by a desired 
one of the methods described previously with respect to Figs. 17 to 20; the losses thus determined are set in loss 
inserting parts 2Li. 2L2 and 2L3. respectively The pieces of speech data received from the other terminals are attenuated 
by the losses set In the loss inserting parts 2Li to 2L3 and then mixed by a mixer 2M. thereafter being outputted to the 

30 speaker SP. The basic principles and operations are the same as those described previously. 

As described above, according to this embodiment, at the time of mixing users' voices received from respective 
terminal units, their speech quality is changed according to the distance between the respective users' avatars and that 
of the user of the terminal concerned, the degree of eye contact between them, or similar condition through utilization 
of the position data and direction-of-eyes data of the respective users' avatars which are received together with their 

35 voices: hence, it is possible to aeate in the virtual space an environment in which all users are allowed to clearly hear 
sounds and voices all around them, immediately perceive the directions of sounds and voices and understand each 
other even if their avatars move in the virtual space. 

Fifth Embodiment 

40 

While the Fig. 23 embodiment lends realism to the virtual space by changing the sound pressure levels of voices 
of users to be mixed according to the distances and/or directions of eyes of the corresponding avatars relative to that 
of the user of each particular one of the terminals, it is also possible to request the speech data sending terminal or 
server to send the speech data of specified quality. 

45 Fig. 24 illustrates another embodiment of the terminal for use in the distributed connection type system as is the 

case with the Fig. 23 embodiment. According to this embodiment, each user terminal requests the other user terminals 
to send their voices of speech quality specified on the basis of the position and/or direction-of-eyes relationship between 
their avatars in Fig. 23. The requested terminals each send speech data of the specified quality to the requesting ter- 
minals-thls enhances an auditory sense of reality of the mixed speech more than in the above-described embodiments, 

50 lending more realism to the virtual space. Furthermore, since the quality of speech data to be sent can be debased 
according to the circumstances, an average amount of information sent can be reduced: hence, the traffic congestion 
of the communication network can be eased accordingly. 

The Fig. 24 embodiment has a construction in which a speech quality requesting part 12R, a speech quality request 
analyzing part 1 2S and a speech processing part 1 2Q are added to the Fig. 23 embodiment. The speech quality request- 

55 ing part 12R is supplied with speech quality determining parameters for respective avatars which are calculated from 
their position data and/or direction-of-eyes data in a loss determining part 12E to determine losses, such as distances 
from each avatar to the others: the speech quality determining part 12R determines the necessary speech quality cor- 
responding to each distance and provides the information to a packet assembling and disassembling part 12B. The 
packet assembling and disassembling part 12B assembles into a packet a signal which requests each terminal to send 
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speech of the determined quality and sends the packet to each terminal via the channel Interface part 1 2A. The speech 
quality that is defined in terms of distance can be otjtained. for example, by changing the transmission rate of speech 
data. For instance, fourdistance threshold values to D4 are predetermined which bear the relationship Di<D2<D3<D4. 
Each avatar requests another avatar with the distance d in the range of D4<d^Di to send speech data of a 64 Kb/s 

5 transmission rate, another avatar with the distance d in the range of D^Kd^Dz to send speech data of a 32 Kb/s trans- 
mission rate, another avatar with the distance d in the range of D3<d^D4 to send speech data of a 1 6 Kb/s transmission 
rate and still another avatar with the distance d in the range of D4<D to send speech data of an 8 Kb/s transmission rate. 

On the other hand, the speech quality requests received from other terminals are analyzed in the speech quality 
request analyzing part 12S to identify the speech transmission rates requested by the individual terminals and provides 

TO the information to the speech processing part 12K. The speech processing part 12K digitally processes speech signal 
inputted from the microphone MC to convert it into speech data of the requested bit rates, which are provided to the 
packet assembling and disassembling part 12B. The packet assembling and disassembling part 12B sends the speech 
data of the respective bit rates as packets addressed to the requesting terminals via the channel interface part 12A. 
In the packet assembling and disassembling part 12B, the speech data packets received from the respective ter- 

15 minals in response to the requests of the terminal concerned are disassembled into speech data of the requested bit 
rates, which are provided to the loss inserting parts 2Li. 2L2 and 2L3, respectively wherein they are subjected to the 
same processing as described above in respect of the Fig. 23 embodiment, thereafter being mixed by the mixer 2M and 
then provided to the speaker SP. 

Thus, according to this embodiment, the bit rate (and consequently the speech quality in terms of frequency char- 

20 acteristic) increases as the avatar concerned is approached-this provides enhanced sense of reality more than in the 
Fig. 23 embodiment. On the other hand, the bit rate of the speech data decreases with distance from the avatar con- 
cerned. Hence, the amount of Information sent is reduced as a whole and consequently the traffic congestion of the 
communication network is eased accordingly This embodiment has been described as being applied to the distributed 
connection type system; in the case of the centralized connection type system, the same results as described above 

25 could be obtained by employing a construction in which the terminal concerned requests the server to send the speech 
data of the specified quality and the server responds to the request to send the speech data received from the respective 
terminals to the requesting terminal after changing the speech quality (the transmission rate) of the speech data. Alter- 
natively it is possible to utilize a construction in which the server itself determines the transmission rate of the speech 
data to be sent to each terminal on the basis of the speech quality determined for the avatar of the terminal as described 

30 previously with respect to Fig. 21 and sends the speech data of the determined bit rate. 

Sixth Embodiment 

While the above embodiments have been described to give the users of the virtual space an auditory sense of reality 
35 by controlling the speech quality of other avatars on the basis of the positional relationship between each avatar and 
the remaining ones, it is also possible to visually lend realism to the virtual space by controlling the Image quality of 
other avatars on the basis of the above-said positional relation. For example, the image quality of facial videos of users 
is increased as the avatar of the terminal concerned is approached: that is, the closer to the avatar of the terminal user 
concerned, the higher the image quality of facial videos of other users. A description will be given of embodiments based 
40 on this concept. 

An embodiment will be described as being applied to the centralized connection type system. As mentioned previ- 
ously all the terminals share the virtual space, and hence have the same virtual space model, and every each user can 
freely move in the virtual space. Other users can also move in the same virtual space; to recognize this, each user 
prepares avatars of other users in his virtual space and sends his facial video and position information of his avatar 
45 (position coordinates and direction of eyes in the virtual space) to other terminals. Based on the position information of 
avatars of other users received therefrom, each user creates their avatars at specified positions in his virtual space and 
pastes thereto users* facial videos of sizes corresponding to the distances from the avatar of the terminal concerned to 
the other avatars. 

A description will be given of an on-demand type configuration using such a centralized connection type system as 
50 shown in Figs. 2A or 2B. Each user terminal picks up a high quality Image of the user with a video camera, digitizes it 
for each frame and sends it to the server The server has an image memory corresponding to each user and. upon every 
reception of user's image, ovenwrites and stores it in the image memory The quality of video image is defined by the 
number of frames per seconds, resolution (lines/mm), or a combination thereof. The number of frames per second 
contributes to the smoothness of movement of the video image and the resolution contributes to its definition. In accord- 
55 ance with the distance and/or the degree of eye contact between its avatar and each of the avatars of the other users 
in the virtual space, each terminal specifies, fore each user, a different interval at which to send its video image from 
the server (the number of frames per second) or different resolution of the video image. The server sends to the requesting 
terminal the video image of the specified user with the specified resolution and/or the number of frames per second- 
this permits reduction of the amount of information that is sent throughout the system. 
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Fig. 25 is a diagram of the virtual space viewed from above, showing how the terminal specifies the quality of the 
video image that it requests the server to send. In Fig. 25 there is depicted the field of vision of the avatar A! in the visual 
space. The avatar A2 is closest to the avatar Al and also keeps eye contact therewith; hence, for the avatar A2, the 
terminal of the avatar Al requests the server to send a video image of the highest quality. Since the avatar A5 is facing 

5 the avatar A1 but remains a little out of eye contact with the latter, the terminal requests the server of a video image of 
lower quality As for the avatar A3, the terminal of the avatar A1 requests the server of a video image of the lowest quality 
since no eye contact is not established between them. The avatar A6 is outside the field of vision of the avatar A1 , and 
consequently the terminal of the avatar A1 does not request the server of any video image of the avatar A6. Fig. 26 
shows display images of the visual field image that the avatar Al observes in the virtual space depicted in Fig. 25. The 

10 broken lines indicate distance threshold values Di. D2, D3 and D4 relative to the avatar A1 (which are not displayed in 
practice). The avatar images in respective regions defined by these threshold values are each displayed in the quality 
determined as described previously. 

Fig. 27 illustrates an example of the configuration of the server 50 in the virtual space sharing apparatus of the 
centralized connection type system. For the sake of brevity the server 50 is shown to accommodate three terminals and 

15 no audio-related parts are shown. 

The server 50 sends and receives position information (position coordinates and direction of eyes) and video images 
to and from the terminals via channels CHi. CH2 and CH3. The data received from the channels CH^ CH2 and CH3 are 
received in the channel interface parts INF1, INF2 and INF3. respectively The channel interface parts '1NF1 to INF3 each 
analyze the received data and, if it is video image data, transfer it to a video storage part 52K. The video storage part 

20 52K writes the received video image in a memory which stores video images In correspondence with terminals accom- 
modated. When the received data is position information (position coordinates and direction of eyes), the channel inter- 
face parts INF1 to INF3 each transfer it to a position information distributing part 52A. The position information distributing 
part 52A copies the position information received from the channel interface part INF1 and transfers it to the channel 
interface parts INF2 and INF3: the position information distributing part 52A copies the position information received from 

25 the channel interface part INF2 and transfers it to the INF1 and INF3; and the position information distributing part 52A 
copies the position information received from the channel interface part INF3 and transfers it to the channel interface 
parts INF1 and INF3. When the received data is image request information, the channel interface parts INF1 to INF3 
each transfers it to an image requests analyzing part 52J. The image request analyzing part 52J analyzes the received 
request and informs the image storage part 52K of the requested image and. at the same time, informs video processing 

30 part 52N of the requested resolution and/or the number of frames per second and the requesting terminal. The video 
storage part 52K reads out of Its memory the requested image specified by the image request analyzing part 52N and 
transfers it to the video processing part 52N. The video processing part 52N converts the video image received from 
the video storage part 52K to the resolution and/or the number of frames per second specified by the video image request 
analyzing part 52J and, on the basis of the specified requesting terminal information, sends the video image to the 

35 requesting terminal via the channel interface part INF1 and the channel CHi, the channel interface part INF2 and the 
channel CH2, or the channel interface part INF3 and the channel CH3. 

Fig. 28 illustrates an example of the construction of the terminal in the virtual space sharing apparatus of the cen- 
tralized connection type system. No audio-related parts are shown. The terminal 10 sends and receives video images 
and position information to and from the server 50 via a communication network NW and a channel CH. At first, the 

40 terminal 10 picks up the video image of the user by the video camera VC and transfers it to a digital video processing 
part 12J. The digital video processing part 12J digitizes the received video image frame by frame and sends it to the 
server 50 via the channel interface part 12A and the channel CH. When the user changes the position of the viewing 
point through the control device 14, updated position information (coordinates and direction of eyes) is provided to the 
control input processing part 12D. The control input processing part 12D sends the position information to the server 

45 50 via the channel interface part 1 2A and the channel CH. At the same time, the control input processing part 1 2D send 
the position information to the viewing point shift processing part 12Gv as well. The viewing point shift processing part 
12Gv responds to the updated position information to change the visual field image in the virtual space to be presented 
to the user and displays it on the display 13. The control input processing part 12D sends move information to a dis- 
tance/eye contact deciding part 12N. On the other hand, when the received data is position information, the channel 

50 interface part 12A transfers it to an other avatar position and direction-of-eyes analyzing part 12L. The other avatar 
position and directlon-of-eyes analyzing part 12L transfers position coordinates and directions of eyes of other avatars 
to an other avatar shift processing part 12Gm and the distance/eye contact deciding part 12N. respectively The dis- 
tance/eye contact deciding part 12N operates in the same manner as do the distance decision part 52B and the eye 
contact deciding part 52C described previously with respect to Figs. 1 0 and 1 1 . That is. based on the position information 

55 of the user's avatar received from the control input processing part 1 2D and the other avatar move information received 
from the other avatar position and direction-of-eyes analyzing part 12L. the distance/eye contact deciding part 12N 
decides the distance and/or eye contact between the user and each of the other avatars, then decides the image quality 
for the avatar by the method described previously in respect of Fig. 25 and requests the server 50 via the channel 
interface part 1 2A and the channel CH to send the video image of the specified quality In this instance, it is also possible 
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to employ the method described previously with respect to Fig. 19; that is, as the angle 0 between the direction EL of 
eyes of the avatar of the user and each avatar increases, the resolution and/or the number of frames per time is reduced. 
Alternatively, the method described in respect of Fig. 20 may also be used; that is. as the sum of angles a and p between 
the directions of eyes of the two avatars and the straight line joining them increases, the resolution and/or the nurr^er 
5 of frames per time is reduced. 

When the received data is video data, the channel Interface part 1 2A transfers it to the other avatar move processing 
part 12Gm. Based on the position information of other avatars received from the other avatar position and direction-of- 
eyes analyzing part 12L. the other avatar shift processing part 12Gm changes the position and direction of eyes of each 
avatar, then pastes the video image (facial videos) received from the channel interface part 12A to the corresponding 

10 avatar in a size corresponding to the distance from the user's viewing point to the avatar, then converts the avatar image 
to the position viewed from the user's viewing point and displays it on the display 13. 

Fig. 29 illustrates an example of the configuration of the terminal in the virtual space sharing apparatus of the 
distributed connection type system. This example differs from the Fig. 28 example in that the terminal 12 directly sends 
and receives video data and position information (position coordinates and direction of eyes) to and from other terminals 

15 via the communication network N W and the channel CH. To send video images of the quality specified by other terminals, 
each terminal is provided with a video storage and processing part 12Q and a video request analyzing part 12R in place 
of the digital video processing part 12J. The video camera VC picks up video of the user and transfers it to the video 
storage and processing part 12Q. The video storage and processing part 12Q digitizes the received video image on a 
framewise basis and stores It. When the received data is video image request information, the channel interface part 

20 12A transfers the request information to tiie video request analyzing part 12R. The video request analyzing part 12R 
analyzes the received request and informs the image storage and processing part 12Q of the requested resolution and 
the requesting terminal. The video storage and processing part 12Q converts its stored video image to the specified 
resolution and/or number of frames per time and sends it to the requesting terminal via the channel interface 1 2A and 
the channel CH. The other arrangements and operations are the same as in the Fig. 27 example, and hence no descrip- 

25 tion will be given of them. 

As described above, according to the embodiments of Figs. 27, 28 and 29, each terminal in the distributed connection 
type system or the server in the centralized connection type system stores high-quality video of each user in its memory 
and. only when requested by each terminal, sends the video in specified quality Hence, these embodiments effectively 
avoid the traffic congestion of the communication network and lessen the burden of processing for receiving video images 

30 at the terminal, resulting in the effect of preventing degradation of image quality. 

In the above, the on-demand system has been described, but when the on-demand system is not utilized, since in 
the centralized connection type systems of Figs. 27 and 28 the latest position information of avatars of all terminals is 
stored in the position information distributing part 52A of the server 50 in Fig. 27, the distances between the avatar of 
each terminal and the avatars of the other terminals are calculated through the use of the stored position information. 

35 then the levels of resolution and/or the numbers of frames per second of the video images to be sent to each terminal 
from the others are determined according to the distances between them, and tiie video images are processed in the 
video processing part 52K accordingly. In this instance, the distance/eye contact decision part 12N need not be provided 
in the terminal of Fig. 28. In the case of the distributed connection type system, the levels of image quality of the avatar 
of each terminal user relative to the avatars of the other users are determined in the distance/eye contact deciding part 

40 12N on the basis of the relationship between the position information of the avatars of the other users received in tiie 
terminal of Fig. 29 from the other terminals and the position information of the avatar of the user of this terminal, and 
the video image of the terminal user is sent at the determined levels of quality the video storage and processing part 
1 2Q to the other terminals, respectively Also in this instance, the video image request analyzing part 1 2R is not needed 
and, as indicated by the broken line, the distance/eye contact decision part 1 2N informs the video storage and processing 

45 part 12Q of the determined image quality . 

It will be apparent that many modifications and variations may be effected without departing from the scope of the 
novel concepts of the present invention. 

Claims 

50 

1. A virtual space sharing apparatus which has a plurality of terminals connected to a communication network and 
sharing a predetermined common virtual space and generates and displays a visual field image which changes as 
an avatar representing a user of each terminal moves in said virtual space at said each terminal, said each terminal 
comprising: 

control means which generates signals for selectively specifying its position and direction of eyes in said 
virtual space; 

visual field image generating means which generates a visual field image in said direction of eyes in said 
virtual space from said position as a viewing point; 

position information sending and receiving means which sends said position and said direction of eyes as 
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position information to said communication network and receives therefrom position information sent from other 
terminals; 

avatar image forming means which forms avatar images representative of users of said other terminals in 
said visual field at positions corresponding to said received position Information; and 
5 display means which displays a combined image containing said visual field image and said avatar images. 

2. The apparatus of claim 1 . which has server means connected to said each terminal via said communication network 
and in which said each terminal comprises speech sending and receiving means for sending speech data of its user 
to said server means via said communication network and for receiving speech data of said users of said other 

10 terminals from said server means and speech output means for outputting said received speech data as speech; 
said server means comprising: 

select means which calculates, from position information received from said terminals, the distances between 
said avatar (3f the user of said each terminal and other avatars and selects those of said other avatars which have 
said distance within a predetermined threshold value; and 
?5 mixer means which. In a group consisting of any one of said avatars and said avatars selected by said select 

means relative thereto, mixes speech data from the terminals corresponding to said avatars except each particular 
one and sends said mixed speech data to the terminal corresponding to said each particular avatar. 

3. The apparatus of claim 1 , wherein said terminals are interconnected via said communication network and each of 
20 said terminals comprises: 

speech sending and receiving means which sends speech data of Its user to all the other terminals via said 
communication network and receives therethrough speech data of users of said other terminals; 

select:means which calculates, from position information received from said other terminals, the distances 
between the avatar of the user of said each terminal and said other avatars and selects those of said other avatars 
25 which have said distance within a predetermined threshold value; 

mixer means which mixes speech data received from the terminals corresponding to said avatars selected 
by said select means and outputs the mixed sound data; and 

speech output means for outputting said mixed speech data as a sound. 

30 4, The apparatus of claim 2 or 3, wherein said select means is means which selects, for each avatar, those of the other 
avatars whose distances therefrom are within said threshold value and which are present in the field of vision of 
said each avatar. 

5. The apparatus of claim 4, wherein said select means is means which additionally selects that one of the other 
35 avatars which is outside of the field of vision of said each avatar but inside of the field of vision of any one of said 

selected avatars and provides speech data from said additionally selected avatar to said mixer means. 

6. The apparatus of claim 2 or 3. wherein said select means is means which selects, for each avatar, those other 
avatars whose distances therefrom is within said threshold value and which are each present in the field of vision 

40 of the other. 

7. The apparatus of claim 6, wherein said select means is means which additionally selects that one of the other 
avatars which is outside of the field of vision of said each avatar but inside of the field of vision of any one of said 
selected avatars and provides speech data from said additionally selected avatar to said mixer means. 

45 

8. The apparatus of claim 1 . which has server means connected to said each terminal via said communication network 
and in which said each terminal comprises speech sending and receiving means for sending speech data of its user 
to said server means via said communication network and for receiving speech data of said users of said other 
terminals from said server means and speech output means for outputting said received speech data as speech; 

50 said server means comprising: 

first mixer means which mixes speech data received from terminals corresponding to all of said avatars and 
outputs environment sound data; 

conversation monitor means which, on the basis of position information received from each of said terminals, 
searches for a group of avatars which mutually satisfy a conversation enable condition; 

second mixer means which generates, for the terminal of each avatar of said group, mixed sound data by 
mixing speech data received from the terminal corresponding to the other avatar of said group and said environment 
sound; and 

means which sends said mixed sound data to said terminal of said each avatar. 
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9. The apparatus of claim 1 . wherein said terminals are interconnected via said communication network and each of 
said terminals comprises: 

speech sending and receiving means which sends speech data of Its user to all the other terminals via said 
communication network and receives therethrough speech data of users of said other terminals; 
5 conversation monitor means which, on the basis of position information received from said other terminals, 

searches for a group of avatars which satisfy a conversation enable condition with respect to the avatar of the user 
of said each terminal: 

first mixer means which mixes speech data received from terminals corresponding to all of said avatars and 
outputs environment sound data; 
10 second mixer means which mixes speech data received from the terminals corresponding to said avatars of 

said group and said environment sound to generate mixed speech data; and 
speech output means for outputting said mixed speech data as a sound. 

10. The apparatus of claim 8 or 9, wherein said conversation monitor means presents, as said conversation enable 
15 condition, at least one condition that the distance between the avatar of the user of said each terminal and the other 

avatar in said group, calculated from position information received from said terminals, is within a predetermined 
threshold value. 

11. The apparatus of claim 10. wherein said conversation enable condition Includes a condition that said other avatar 
20 Is Inside of the field of vision of said avatar of the user of said each terminal. 

12. The apparatus of claim 8 or 9. which further comprises channel switching means which one-way connects speech 
data received from all of said terminals to said first mixer means, two-way connects to said second mixer means 
speech data received from said avatars of said group and one-way connects an environment sound data outputted 

25 from said first mixer means to said second mixer means. 

13. The apparatus of claim 8 or 9. which further comprises loss inserting means which inserts a loss into said environ- 
ment sound data outputted from said first mixer means and provides it to said second mixer means. 

30 14. A virtual space sharing apparatus which has a plurality of terminals connected to a communication network and 
sharing a predetermined common virtual space and generates and displays a visual field image which changes as 
an avatar representing a user of each terminal moves in said virtual space at said each terminal, said each terminal 
comprising: 

sending and receiving means which receives speech data of users of said terminals except said each terminal 
35 and position information of their avatars and sends speech data of the user of said each terminal and position 
information of its avatar to said terminals; 

speech quality determining means which determines the levels of quality for speech data of other users in 
accordance with the relationship of the avatars of said other users to the avatar of said each user through the use 
of position information of said avatars of said other users received from said other terminals; 

speech quality control means which controls the quality of speech data of said other users in accordance 
with the levels of quality determined therefor relative to the avatar of said each user; 

mixer means which mixes said quality-controlled speech data of said other users in correspondence with 
said each user and outputs mixed sound data; and 

acoustic signal output means which outputs said mixed sound data from said mixer means as an acoustic 

45 signal. 

15. A virtual space sharing apparatus which has a plurality of terminals connected to a server and sharing a predeter- 
mined common virtual space and generates and displays a visual field image which changes as an avatar repre- 
senting a user of each terminal moves in said virtual space at said each terminal, said server comprising: 

sending and receiving means which receives speech data of users of said terminals except said each terminal 
and position information of their avatars and sends speech data of the user of said each terminal and position 
information of its avatar to said terminals; 

speech and position information distributing means which distributes speech data of the user and position 
information of its avatar, received from said each terminal, to all the other terminals via said sending and receiving 
55 means; 

speech quality determining means which determines the speech quality for speech data of other users in 
accordance with the relationship of the avatars of said other users to the avatar of said each user through the use 
of position information of said avatars of said other users received from said other terminals: 

speech quality control means which controls the speech quality of speech data of said other users in accord- 
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ance with the speech quality determined therefor relative to the avatar of said each user; and 

mixer means which mixes said quality-controlled speech data of said other users in correspondence with 

said each user and outputs mixed sound data; 

wherein said mixed sound data is sent via said sending and receiving means to corresponding ones of said 
5 terminals. 

16. The apparatus of claim 14 or 15, wherein said speech quality is a sound pressure level. 

17. The apparatus of claim 14 or 15. wherein said position information of said avatar includes its position coordinate 
10 and said speech quality determining means is means which determines the speech quality of each of said other 

users so that its speech quality becomes lower with an Increase in the length of a straight line joining the position 
coordinate of the avatar of said each user and the position coordinate of the avatar of said other user. 

18. The apparatus of claim 14 or 15, wherein said position Information of said avatar includes its position coordinate 
15 and direction of eyes and said speech quality determining means is means which determines the speech quality of 

each of said other user, through utilization of said position coordinate and direction of eyes of the avatar of said 
each user, so that the speech quality of said other user is high or low, depending on whether its avatar is In the field 
of vision of the avatar of said each user. 

20 19. The apparatus of claim 14 or 15, wherein said position information of said avatar includes its position coordinate 
and direction of eyes and said speech quality determining means determines the speech quality of each of said 
other users, through utilization of the position coordinates and direction of eyes of said each user and each of said 
other user, so that the speech quality of each of said other users becomes lower with an increase in the angle 
between a straight line joining the coordinates of the avatar of said each user and each of said other users and the 

25 direction of eyes of the avatar of each of said other users. 

20. The apparatus of claim 14 or 15, wherein said position Information of said avatar includes its position coordinate 
and direction of eyes and said speech quality determining means is means which determines the speech quality of 
each of said other users so that said speech quality becomes lower with an increase in the sum of the angles of 

30 rotation of the directions of eyes of the avatars of said each user and each of said other users to a straight line 
joining their coordinates. 

21 . The apparatus of claim 14, in which said each terminal further comprises: means which send to the terminal of each 
of said other users a quality request signal requesting said speech quality determined therefor; and speech data 

35 processing means which responds to said quality request signal from each of said other users to to send thereto 
the speech data of said each terminal at a transmission rate specified by said quality request signal. 

22. The apparatus of claim 1 5. wherein said sending and receiving means of said server means which sends said mixed 
sound data to said each terminal at the transmission rate corresponding to said speech quality determined for the 

40 avatar of said each terminal. 

23. A virtual space sharing apparatus which has a plurality of terminals connected to a server and sharing a predeter- 
mined common virtual space and generates and displays a visual field image which changes as an avatar repre- 
senting a user of each terminal moves in said virtual space at said each terminal. 

wherein said each terminal comprises: sending and receiving means which receives video image data of 
each of other users and position information of its avatar from said server means and sends video image data of 
the user of said each terminal and position information of its avatar to said server; camera means which picks up 
the video image of said user of said each terminal and outputs a video signal; digital processing means which 
digitally processing said video signal and sends it to said server means via said sending and receiving means; 

50 quality specifying and video requesting means which determines the image quality for the avatar of each of said 
other users on the basis of the relationship between the position information of the avatar of each of said other users 
and the position information of the avatar of said each user and sends via said sending and receiving means to said 
server means a video request signal requesting a video image of said determined quality; and means which gen- 
erates an avatar image on the basis of the video image data of each of said other users received from said server 

55 means and displays it in a visual field image of said each user at a position specified by the position information of 
the avatar of each of said other users; and 

wherein said server means comprises: position information distributing means which sends position infor- 
mation of the avatar of the user, received from said each terminal, to all the other terminals; video memory means 
which stores video image data in correspondence with said terminals; means which writes received video image 
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data into said video memory means; and quality processing means which analyzes a video image request signal 
received from each of said other users, then reads out the requested video image data from said video memory 
means and sends it to a requesting terminal after processing it to have specified quality. 

5 24. A virtual space sharing apparatus which has a plurality of terminals connected to a server and sharing a predeter- 
mined common virtual space and generates and displays a visual field image which changes as an avatar repre- 
senting a user of each terminal moves in said virtual space at said each terminal, 
said each terminal comprising: 

sending and receiving means which receives video image data of the users of other terminals and position 
10 information of their avatars and sends to said other terminals the video image of the user of said each terminal and 
position information of its avatar; 

quality specifying and video requesting means which determines the image quality for the avatar of each of 
said other users on the basis of the relationship between the position information of the avatar received from each 
of said other terminals and the position information of the avatar of said each terminal and sends via said sending 
IS and receiving means to each of said other terminals a video request signal requesting a video image of said deter- 
mined quality; 

video memory means which stores video data; 

camera means which picks up the video image of the user of said each terminal; 

video processing means which digitally processes said video signal and writes it into said video memory 
2G means; 

means which analyzes a video image request signal received from each of said other users, then reads out 
the requested video image data from said video memory means and sends it to a requesting terminal after processing 
it to have specified quality; 

and means which generates an avatar image on the basis of the video image data received from each of 
25 said other terminals and displays it in a visual field image of said each user at a position specified by the position 
information received from each of said other terminals. 

25. The apparatus of claim 23 or 24, wherein said image quality is resolution of the video image. 

30 26. The apparatus of claim 23 or 24. wherein said image quality is the number of frames of the video image data per 
unit time. 

27. The apparatus of claim 23 or 24. wherein said position information of said avatar includes its position coordinate 
and said quality specifying and video requesting means includes means which determines the image quality of each 

35 of said other users so that its image quality becomes lower with an increase in the length of a straight line joining 
the position coordinate of the avatar of the user of said each terminal and the position coordinate of the avatar of 
each of said other users in the field of vision of the former. 

28. The apparatus of claim 23 or 24, wherein the position information of said avatar includes its position coordinate and 
40 said quality specifying and video image requesting means includes means which determines the image quality of 

each of said other users so that said image quality becomes lower with an increase in the angle between the direction 
of eyes of the avatar of the user of each terminal and the position of tiie avatar of each of said other users in the 
field of vision of the former. 

45 29. The apparatus of claim 23 or 24, wherein said position information of said avatar includes its position coordinate 
and direction of eyes and said quality specifying and video image requesting means includes means which deter- 
mines the image quality of each of said other users so that said image quality becomes lower with an increase in 
the sum of tiie angles of rotation of tiie directions of eyes of the avatars of the avatar of tiie user of said each terminal 
and the avatar of each of said other users to a straight line joining their coordinates in the field of vision of the avatar 

50 of the user of said each terminal. 

30. A display method for a virtual space which has a plurality of terminals connected to a communication network and 
sharing a predetermined common virtual space and generates and displays a visual field image which changes as 
an avatar representing a user of each terminal moves in said virtual space at said each terminal, said each terminal 
55 performing the steps of: 

(a) generating, by control means, signals for selectively specifying its position and direction of eyes in said virtual 
space; 

(b) sending said position and said direction of eyes as position information to said communication network; 
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(c) generating a visual field image in said direction of eyes in said virtual space from said position as a viewing 
point; 

(d) receiving, from said communication networK position information sent thereto from other terminals; 

(e) forming avatar images representative of users of said other terminals in said visual field at positions corre- 
sponding to said received position information; and 

(f) displaying a combined image containing said visual field image and said avatar images. 

31 . The method of claim 30. wherein server means is provided which is connected via said communication network to 
said terminals, respectively, said step (b) includes a step of sending speech data of the user of said each terminal 
to said server means via said communication network, and said step (d) includes a step of receiving speech data 
of other terminals from said server means and outputting said received speech data as speech; 

wherein said server means performs the steps of: 

(g) calculating, from position information received from said other terminals, the distances between said avatar 
of the user of said each terminal and the avatars of said other terminals and selecting those of said avatars of 
said other terminals which have said distance within a predetermined threshold value; and 

(h) mixing, in a group consisting of any one of said avatars and said avatars selected said relative thereto, 
speech data from the terminals corresponding to said avatars except each particular one and sending said 
mixed speech data to the terminal corresponding to said each particular avatar. 

32. The method of claim 30. wherein said terminals are interconnected via said communication network, said step (b) 
Includes a step of sending speech data of the user of said each terminal to all the other terminals via said commu- 
nication network and and said step (d) includes a step of receiving therethrough speech data of the users of said 
other terminals via said communication network; 

said each terminal performing the steps of: 

(g) selecting, from position information received from said other terminals, the distances between the avatar of 
the user of said each terminal and the avatars of said other terminals and selecting those of the avatars of said 
other terminals which have said distance within a predetermined threshold value; and 

(h) mixing speech data received from the terminals corresponding to said selected avatars and outputting the 
mixed sound data as a sound. 

33. The method of claim 31 or 32, wherein said step (g) includes a step of selecting, for each avatar, those of the other 
avatars whose distances therefrom are within said threshold value and which are present in the field of vision of 
said each avatar. 

34. The method of claim 33. wherein said step (g) includes a step of additionally selecting that one of the other avatars 
which is outside of the field of vision of said each avatar but inside of the field of vision of any one of said selected 
avatars. 

35. The method of claim 31 or 32. wherein said step (g) includes a step of selecting, for each avatar, those other avatars 
whose distances therefrom is within said threshold value and which are each present in the field of vision of the other. 

36. The method of claim 35. wherein said step (g) includes a step of additionally selecting that one of the other avatars 
which is outside of the field of vision of said each avatar but inside of the field of vision of any one of said selected 
avatars. 

37. The method of claim 30, wherein server means is provided which is connected via said communication network to 
said terminals, respectively, said step (b) includes a step of sending speech data of the user of said each terminal 
to said server means via said communication network, and said step (d) includes a step of receiving speech data 
of other terminals from said server means and outputting said received speech data as speech; 

wherein said server means performs the steps of: 

(g) mixing speech data received from terminals corresponding to said avatars to generate environment sound 
data; 

(h) searching for a group of avatars which mutually satisfy a conversation enable condition on the basis of the 
position information received from said other terminals; 

(!) generating mixed sound data by mixing speech data received from the terminals corresponding to the avatars 
of said group other than each one of them and said environment sound data; 
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0) sending said mixed sound data, generated for the terminal of said each avatar, to said terminal. 

38. The method of claim 30, wherein said terminals are interconnected via said communication network, said step (b) 
includes a step of sending speech data of the user of said each terminal to all the other terminals via said commu- 
5 nication network and and said step (d) includes a step of receiving speech data of the users of said other terminals 

via said communication network; 

said each terminal performing the steps of: 

(g) mixing speech data received from terminals coresponding to said avatars to generate environment sound 
10 data; 

(h) searching for a group of avatars which mutually satisfy a conversation enable condition on the basis of the 
position information received from said other terminals; 

(i) generating mixed sound data by mixing speech data received from the terminals corresponding to the avatars 
of said group other than each one of them and said environment sound data; 

15 G) outputting said mixed sound data as a sound. 



39. The method of daim 37 or 38, wherein said conversation enable condition in said step (h) includes at least one 
condition that the distance between the avatar of the user of said each terminal and the other avatar in said group, 
calculated from position information received from said terminals, is within a predetermined threshold value. 

20 

40. The method of claim 39. wherein said conversation enable condition includes a condition that said other avatar is 
inside of the field of vision of said avatar of the user of said each terminal. 

41. The method of claim 37 or 38. wherein said step (i) includes a step of inserting a loss in said environment sound 
25 data and then mixing it with said received speech data. 

42. A display method for a virtual space which has a plurality of terminals connected to a communication network and 
sharing a predetermined common virtual space and generates and displays a visual field image which changes as 
an avatar representing a user of each terminal moves in said virtual space at said each terminal, said each terminal 

30 performing the steps of : 



(a) receiving speech data of the users and position information of their avatars from the other terminals; 

(b) sending speech data of the avatar of said each terminal and position information of its avatar to each of said 
other terminals; 

(c) determining the levels of quality for speech data of the users of said other terminals in accordance with the 
relationship of the avatars of the users said other terminals to the avatar of the user of said each terminal through 
the use of position information of said avatars of said other users received from said other terminals: 

(d) controlling the quality of speech data of said other users in accordance with the levels of quality determined 
therefor relative to the avatar of said each user; and 

(e) mixing said quality-controlled speech data of said other users in correspondence with said each user and 
outputting mixed sound data as acoustic signal. 

43. A display method for a virtual space which has a plurality of terminals connected to server means and sharing a 
predetermined common virtual space and generates and displays a visual field image which changes as an avatar 
representing a user of each terminal moves in said virtual space at said each terminal, said each server means 
performing the steps of: 



(a) receiving speech data of the users and position information of their avatars from the other terminals; 

(b) sending to each terminal speech data and position information of their avatars; 

(c) distributing the speech data of the user and position information of its avatar received from said each terminal 
to all the other terminals; 

(d) determining the levels of quality for speech data of the users of said other terminals in accordance with the 
relationship of the avatars of the users of said other terminals to the avatar of the user of said each terminal 
through the use of position information of said avatars of said other users received from said other terminals; 

(e) controlling the quality of speech data of said other users in accordance with the levels of quality determined 
therefor relative to the avatar of said each user; and 

(f) mixing said quality-controlled speech data of said other users in correspondence with said each user and 
outputting and sending mixed sound data to said terminals corresponding thereto. 
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44. The method of claim 42 or 43. wherein said speech quality is a sound pressure level. 

45. the apparatus of claim 42 or 43, wherein said position information of said avatar includes its position coordinate 
and said speech quality determining step includes a step determining the speech quality of each of said other users 
so that its speech quality becomes lower with an increase in the length of a straight line joining the position coordinate 
of the avatar of said each user and the position coordinate of the avatar of said other user. 

46. The method of claim 42 or 43, wherein said position information of said avatar includes its position coordinate and 
direction of eyes and said speech quality determining step includes a step of determining the speech quality of each 
of said other user, through utilization of said position coordinate and direction of eyes of the avatar of said each 
user, so that the speech quality of said other user is high or low, depending on whether its avatar is in the field of 
vision of the avatar of said each user. 

47. The method of claim 42 or 43. wherein said position information of said avatar includes its position coordinate and 
direction of eyes and said speech quality determining step includes a step of determining the speech quality of each 
of said other users, through utilization of the position coordinates and direction of eyes of said each user and each 
of said other user, so that the speech quality of each of said other users becomes lower with an increase in the 
angle between a straight line joining the coordinates of the avatar of said each user and each of said other users 
and the direction of eyes of the avatar of each of said other users. 

48. The method of claim 42 or 43. wherein said position information of said avatar includes its position coordinate and 
direction of eyes and said speech quality determining step includes a step of determining the speech quality of each 
of said other users so that said speech quality becomes lower with an increase in the sum of the angles of rotation 
of the directions of eyes of the avatars of said each user and each of said other users to a straight line joining their 
coordinates. 

49. The method of claim 42, in which said each terminal further performing the steps of: sending to the terminal of each 
of said other users a quality request signal requesting said speech quality determined therefor: and responding to 
said quality request signal from each of said other users to send thereto the speech data of said each terminal at 
a transmission rate specified by said quality request signal. 

50. The method of claim 43, wherein said server means further performing a step of sending said mixed sound data to 
each of said other terminals at a transmission rate corresponding to said speech quality determined for the avatar 
thereof. 

51. A display method for a virtual space which has a plurality of terminals connected to a server via a communication 
network and sharing a predetermined common virtual space and generates and displays a visual field image which 
changes as an avatar representing a user of each terminal moves in said virtual space at said each terminal, 

wherein said each terminal performing the steps of: 

(a) picking up the video image of the user of said each terminal, digitally processing the video signal and sending 
the video image data of said user to said server; 

(b) sending position information of the avatar of said each terminal to said server; 

(c) receiving position information of the avatar of each user from said server; 

(d) determining the image quality for the avatar of each of said other users on the basis of the relationship 
between the position information of the avatar of each of said other users and the position information of the 
avatar of said each user; 

(e) sending to said server means to said server a video request signal requesting a video image of said deter- 
mined quality; and 

(f) generating an avatar image on the basis of the video image data of each of said other users received from 
said server and displaying it in a visual field image of said each user at the position specified by the position 
information of the avatar of each of said other users; and 

wherein said server performing the steps of: 

(g) writing video image data received from each terminal in correspondence therewith; 

(h) sending position information of the avatar of the user received from each terminal to all the other terminals; 
and 
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(i) analyzing a video image request signal received from each of said other users, then reading out the requested 
video image data from said video memory means and sending It to the requesting terminal after processing it 
to have specified quality. 

52. A display method for a virtual space which has a plurality of terminals interconnected via a communication network 
and sharing a predetermined common virtual space and generates and displays a visual field image which changes 
as an avatar representing a user of each terminal moves in said virtual space at said each terminal, 
said each terminal performing the steps of: 



10 



(a) picking up the video image of the user of said each terminal, digitally processing the video signal and writing 
it in video memory means: 

(b) sending position information of the avatar of said each terminal to the other terminals: 

(c) receiving position information of the avatar of the user of each of said other terminals; 

(d) determining the Image quality for the avatar of each of said other users on the basis of the relationship 
15 between the position information of the avatar of each of said other users and the position information of the 

avatar of said each user; 

(e) sending to each of said other terminals a video request signal requesting a video image of said determined 
quality; 

(f) receiving the video request signal from each of said other terminals; 

^0 (9) sending video data of a user read out of said video image memory to the requesting terminals after processing 

it to have quality specified by said video request signal; and 

(h) generating an avatar Image on the basis of the video image data of each of said other users received there- 
from and displaying it in a visual field image of said each user at the position specified by the position information 
of the avatar of each of said other users. 



25 



53. The method of claim 51 or 52, wherein said image quality is resolution of the video image, 

54. The method of claim 51 or 52. wherein said image quality is the number of frames of the video image data per unit 
time. 



30 



55. The method of claim 51 or 52, wherein said position information of said avatar includes its position coordinate and 
said step (d) includes a step of determining the image quality of each of said other users so that its image quality 
becomes lower with an increase in the length of a straight line joining the position coordinate of the avatar of the 
user of said each terminal and the position coordinate of the avatar of each of said other users in the field of vision 

35 of the former. 

56. The method of claim 51 or 52. wherein the position information of said avatar includes its position coordinate-and 
said step (d) includes a step of determining the image quality of each of said other users so that said image quality 
becomes lower with an increase in the angle between the direction of eyes of the avatar of the user of each terminal 

40 and the position of the avatar of each of said other users in the field of vision of the former. 

57. The method of claim 51 or 52, wherein said position information of said avatar includes its position coordinate and 
direction of eyes and said step (d) includes a step of determining the image quality of each of said other users so 
that said image quality becomes lower with an increase in the sum of the angles of rotation of the directions of eyes 

45 of the avatars of the avatar of the user of said each terminal and the avatar of each of said other users to a straight 
line joining their coordinates in the field of vision of the avatar of the user of said each terminal. 



50 



55 



BNSDOCIO; <EP 069e0ieA2_L> 



23 



EP 0 696 018 A2 




EP 0 696 018 A2 



FIG. 2A 



10, 



TERM 



IO2 



IO3 



TERM 



TERM 



C 



m 



SERVER 



I 



LAN 



y 



50 



TERM 


)2 










TERM 


)3 


, ISDN j 





SERVER 



BNSDOCID: <EP 06960 1 8A2J_> 



25 



EP0 696 018 A2 




26 

8NS0OCI0: <EP 06960 18A2J_> 



EP 0 696 018 A2 



FIG. 4A 




FIG. 4B FIG. 4C 




27 



EP0 696 018 A2 





FIG. 5 






CHANNEL INTF 





I 



52 



CONN CONT 



50 



/ 



53A 





POSI 


EYE DIR 


A1 






A2 


X2,y2 




A3 

1 


^3,y3 

1 


1 


1 
1 
1 


1 
1 
1 


1 

• 
1 



53 





Al 


A2 


A3 




AT 


X 


O 






A2 




X 






A3 






X 




1 

1 
1 











FIG. 6 



AVATAR ID 



MESSAGE 10 



SPACE ID 



x.V.z 



STATE FLAG 



"mm" 



AID 

MID 
SID 

COV 
ED 

SFLG 



28 

BNSOOCID: <EP 069601 8A2_L> 



FIG. 7 



EP0 696 018 A2 



r 



12G 



VIDEO 




AUDIO 


GEN 






OUTPUT 



1. 



12 



12ET 12ED 

] TABLE I I TABLE ] 
I 1 I J 




12C 



CPU 



12 K 



A/V 
INPUT 



VC 



MC 



12D 



CONT 


INPUT 


PROC 













INTF 



NW 



FIG.8 



AID 


AIDi 


AID2 


AID3 


CFLG 


1 


0 


0 


SFLG 


Fi 


F2 


F3 


GOV 
(x,y, z) 


xi,yi,zi 


^2,y2 Z2 


X2,y2;22 


ED 


^1 


h 


^3 



29 



.069601 8 A2J_> 



FIG. 9A 
FIG. 9B 

FIG. 9C 
FIG, 9D 

FIG. 9E 
FIG. 9F 

3NSDOCIO: <EP 069601 8A2J_> 



EP 0 696 018 A2 




30 



EP0 696 018 A2 




31 

BNSDOCID: <EP 0696018A2J_> 



EP O 696 018 A2 




BNSOOCID: <EP 069601 eA2J_> 



32 



EP 0 696 018 A2 



FIG. 12 



j=(jx Jy) 





Pj(xj.yj) 



Pi(^,yi) 



FIG. 13 




vs 



BNSDOCID: <EP 069601 BA2J_> 



33 



EP 0 696 018 A2 



LO 



r 



1 



CM 



. GQ 

CN 
ID 



L 



to 

CM 

to 



S: 



in CO 
o z 



\ ^ 



< 

-J 

IT) 



V 

\ 

\ \ 
\ \ 



CN 



ffi 




— 1 


to 


LO 


coco 




32 



O Z 



1^ 



LO 



\ 



[ 

cn 

u. 



CO 

18 



V-' V-' A ' 



It 



Q 
CO 



n 

Q 
CO 



o 

CO 



O 

CO 



CO 

tTico 



o 

CO 
*■ 

CO I 



CO 



o I 



o 



•X 



X 



n 
X 
CJ 



CJ 



IT) 

X 



X 



X 



34 

»NSDOCiO; <£P 069e018A2_L> 



EP0 696 018 A2 




35 



8NSDOCIO: <EP 069601 BA2J_> 



EP.0 696 018 A2 




EP0 696 018 A2 




EP0 696 018 A2 




39 

8NSDOCID: <EP 069601 8A2J_> 



FIG. 22 



EP0 696 018 A2 



NW- 



12A 



CH INTF 



12 



J- 



12Gm 



OTHER AVATAR 
SHIFT PROC 



I 



L 



12Gv 



VPS 
PROC 



V 



12G 



12D 



CONT INPUT PROC 



13 



DISPLAY 



SP 



MC 



10 



40 

BNSOOCIO: <£P 069601 aA2J_> 



EP0 696 018 A2 




41 

3NSDOCID; <EP 069601 eA2J_> 



EP0 696 018 A2 




BNSOOCIO: <EP 0696018A2_L> 



42 



EP 0 696 018 A2 



FIG. 25 




43 



EP 0 696 018 A2 




EP0 696 018 A2 



FIG. 28 



CH 



CH 
INTF 



VIDEO IMAGE 
DIG PROC 





OTHER AVATAR 




POS/EYE 


DiR - 




ANAL 










DIST/ EYE 




CONT 






DECIDING 



12N 



CONT INPUT PROC 




069601 eA2 J_> 



45 



EP0 696 018 A2 



FIG. 29 



CH 



NW- 



12 



12A 



12Q 
J- 



VIDEO IMAGE 
STORAGE/ PROC 



I 



I 



VIDEO IMAGE 
REQ ANAL 



y 



12R 



CH 
INTF 



OTHER AVATAR 
POS/EYE DIR 
ANAL 



12L 



DIST/EYE 
CONT 
DECIDING 



_^12G 
12Gm 



OTHER AVATAR 
SHIFT PROC 



12N 



12Gv 



VPS 
PROC 



CONT INPUT PROC 



•12D 



VIDEO 
CAMERA 



J. 



13 



DISPLAY 



46 

BNSDOCIO: «EP CI696018A2_I.> 



(19) 



J 



Europclisches Patentamt 
European Patent Office 
Office europ^en des brevets 



(12) 



(88) Date of publication A3: 

14.05.1997 Bulletin 1997/20 

(43) Date of publication A2: 

07.02.1996 Bulletin 1996/06 

(21) Application number: 95112163.1 

(22) Date of filing: 02.08.1995 



(11) EP 0 696 018 A3 

EUROPEAN PATENT APPLICATION 

(51) IntCL^: G06T 15/00. G06F 17/00 



(84) Designated Contracting States: 
DE FR GB 

(30) Priority: 03.08.1994 JP 182058/94 

27.12.1994 JP 325858/94 

13.01.1995 JP 4235/95 
16.06.1995 JP 150501/95 
05.07.1995 JP 169919/95 

(71) Applicant: NIPPON TELEGRAPH AND 
TELEPHONE CORPORATION 
Shlnjuku-ku, Tokyo 163-19 (JP) 

(72) Inventors: 

• Suzuki, Gen 
Fujisawa-shi, Kanagawa (JP) 

• Sugawara, Shohei 
Yokosuka-shi, Kanagawa (JP) 



• Tanlgawa, Hiroya 
Miura-shI, Kanagawa (JP) 

• Morluchi, Machio 
Yokohama-Shi, Kanagawa (JP) 

• Nagashlma, Yoshio 
Yokosuka-shI, Kanagawa (JP) 

- Nakajima, Yasuhiro 
Yokosuka-shi, Kanagawa (JP) 

• Arita, Hiroyuki 
Tokyo (JP) 

• Murakami, Yumi 
Yokosuka-shi, Kanagawa (JP) 

(74) Representative: Hoffmann, Eckart, Dipl.-lng. et al 
Patentanwalt, 
Bahnhofstrasse 103 
82166 GrSfelfing (DE) 



(54) Shared virtual space display method and apparatus using said method 



(57) A plurality of terminals are connected to a 
server via a communication network and share a prede- 
termined common virtual space. The terminals each 
always send to the server the position coordinate of the 
viewing point and direction of eyes of its user in the vir- 
tual space, and the visual field image viewed from that 
viewing point is displayed on a display Based on the 
position coordinate and direction of eyes of the avatar 
each of the other terminals received from each of the 
other terminals via the server, each terminal generates 
an avatar image in the specified direction and at the 
specified positron and displays it in the visual field. The 
server is always supplied with the latest position infor- 
mation of the avatar from every terminal and, when the 
distance between two arbitrary avatars becomes 
smaller than a threshold value, connects speech chan- 
nels of the two terminals corresponding to these ava- 
tars. 



FIG. 26 



CO 

< 

CO 

o 

CO 

o> 
o 

Q. 
LJJ 



BNSOOCID: <EP 06960 ieA3J_> 




Primed by flanK Xerox (UK) Business Services 
2 1 4 3/:3 4 



EP0 696 018 A3 



J) 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



AppUcstioa Number 

EP 95 11 2163 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Ckadon of document with indiauioo, 
of relevMit pagagts 



appropnate. 



Rckvaat 
to daim 



CLASSIHCATION OF THE 
APPUCATION antCLi) 



PROCEEDINGS OF THE VIRTUAL REALITY ANNUAL 
INTERNATIONAL SYMPOSIUM, SEATTLE, SEPT. 18 
- 22, 1993, 

18 September 1993, INSTITUTE OF 
ELECTRICAL AND ELECTRONICS ENGINEERS, 
pages 394-400. XP000457709 
CARLSSON C ET AL: 'DIVE - A MULTI-USER 
VIRTUAL REALITY SYSTEM" 

* abstract; figure 1 * 

* paragraph 2.1 - paragraph 2.2 * 

COMPUTER GRAPHICS, MAY 1994, USA. 

vol. 28. no. 2, ISSN 0097-8930, 

pages 127-130, XPOOO603O8O 

ROBINETT W: "Interactivity and individual 

viewpoint in shared virtual worlds: the 

big screen vs. networked personal 

di splays" 

* page 128, right-hand column, line 20 - 
line 68 * 

FUJITSU SCIENTIFIC AND TECHNICAL JOURNAL, 

OCT 1990. JAPAN. 

vol. 26, no. 3, ISSN 0016-2523, 

pages 197-206, XPOO0178534 

FUKUDA K ET AL: "Hypermedia personal 

computer conmini cation system: Fujitsu 

Habitat" 

* paragraph 2; figures 1,2 * 

* paragraph 3.4.1 * 



-/■ 



1.30 



6O6T15/O0 
G06F17/00 



1.30 



1,30 



TECHNICAL FIELDS 
SEARCHED aat.CL6) 



G06F 

G06T 
H04N 
H04M 



2-4, 
31-33 



The present search report has been drawn up for alt dainB 



8 
s 

a 

s 

I 

C 



Place af 

BERLIN 



27 February 1997 



Jonsson, P.O. 



CATEGORY OF CITED DOCUMENTS 

X : partioitarty retevut if taken ilone 

Y : particutarly relevant if combined with aivother 

document of the same cate^ry 
A : tecbnologicat background 
O : noo-writtcn disclosure 
P : intemiediate document 



T : theory or principle underlying the iawation 
£ : earlier patent docuiaent, but pubtisbed oa, or 

ifta- the filing date 
D : document dted in the application 
L : document dted for other reasons 



Sl : recmbcr of the same patent family, correspoading 
document 



2 



3NSDOCID: <EP 069601 8A3J_> 



EP0 696 01 8 A3 



J 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



ApplintioQ Nunbcr 

EP 95 11 2163 



DOCUMENTS CONSIDERED TO BE RELEVANT 



CitatioD of document mtfa iinUcation, where appropriate, 
of rctevant passages 



Relevant 
to claim 



CLASSIHCATION OF THE 
APPUCATION ant.CL6) 



A 
Y 



P,A 



PROC: HUMAN-COMPUTER INTERACTION, USA. 

vol. 2, 8 August 1993, ISBN 0-444-89540-X, 

1993, AMSTERDAM, NL, ELSEVIER, 

pages 694-698. XPO00607918 

BENFORD S ET AL: "Awareness, focus and 

aura: a spatial model of interaction in 

virtual worlds" 

* abstract * 

* paragraph 4.1 - paragraph 4.2 * 



PATENT ABSTRACTS OF JAPAN 

vol. 018, no. 516 (P-1806), 28 Septenter 

1994 

& JP 06 175942 A (TOSHIBA CORP), 24 June 
1994. 

* abstract * 

EP 0 659 006 A (IBM) 21 June 1995 

* abstract; figure 7 * 

* column 1, line 55 - column 2, line 10 * 

* column 2, line 37-43 * 

* column 3. line 14-24 * 

EP 0 634 857 A (IBM) 18 January 1995 

* abstract; figure 3 * 



The present search report has been drawn up for all daims 



14-22. 
42-50 



23-29. 
51-57 

14-22, 
42-50 



14-22. 
42-50 



14-22. 
42-50 



TECHNICAL FIELDS 
SEARCHED (lnt.a.6) 



Plan Af 

BERLIN 



27 February 1997 



Jonsson, P.O. 



CATEGORY OF OTED DOCUMENTS 

X : pirticularly relcvamt If taken liooe 

Y : partiail&ily rdevmnt If combined with uwthcr 

document of the sane category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the ftling date 
D : document dted ia the application 
L : document dted for other reasons 

A : member of the same patent family, corresponding 
document 



BNSDOCIO: <EP 069601 8A3J_> 



EPoege 01 8 as 



J 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



AppUcmtton Number 

EP 95 11 2163 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Cateeory 



Citatioii of doGurocnt with indkadon, where appropriate, 
of rttevant passatei 



Relevant 
to daim 



CLASSIHCATION OF THE 
ATPUCATION OaLa.6) 



P.A 



PROCEEDINGS OF THE VIRTUAL REALITY ANNUAL 
INTERNATIONAL SYMPOSIUM, SEATTLE, 
18 September 1993, 
pages 4G8-414. XPOO0457710 
OHYA J ET AL: "REAL-TIME REPRODUCTION OF 
3D HUMAN IMAGES IN VIRTUAL SPACE 
TELECONFERENCING" 

* abstract; figures 1,5 * 

* paragraph 2.1 * 

* paragraph 3 * 

PROCEEDINGS OF THE VIRTUAL REALITY ANNUAL 
INTERNATIONAL SYMPOSIUM, SEATTLE, 
18 September 1993, 
pages 478-485, XP00O457716 
CAUDELL T P ET AL: "NEURAL MODELING OF 
FACE ANIMATION FOR TELECOMMUTING IN 
VIRTUAL REALITY" 

* paragraph 1; figure 4 * 

COMMUNICATIONS OF THE ASSOCIATION FOR 

COMPUTING MACHINERY, 

vol. 37, no. 8, 1 August 1994, 

pages 83-97, XP0Q0484285 

HIROSHI ISHII ET AL: "ITERATIVE DESIGN OF 

SEAMLESS COLLABORATION MEDIA" 

* page 83; figures 5,5 * 

COMPUTER JOURNAL, 

vol. 37, no. 8, 12 January 1995, 

pages 653-668, XP000486153 

BENFORD S ET AL: "SUPPORTING COOPERATIVE 

WORK IN VIRTUAL ENVIRONMENTS" 

* paragraph 4; figures 1-7 * 

* paragraph 5.1 * 

* paragraph 5.2 * 



23.24. 
51,52 



23.24, 
51,52 



23,24, 
51,52 



TECHNICAL FIELDS 
SEARCHED (Iat.a.6) 



14-29, 

42-57 



The pr 



I search report has been drawn up for ail claims 



8 
8 

a 

3 



Plan of 

BERLIN 



27 February 1997 



Jonsson, P.O. 



CATEGORY OF OTED DOCUMENTS 

X : particularty rdevut if taken alone 

Y : particularly relevant if oorabined with another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the Invention 
C : earlier patent document, but published on, or 

after the filing date 
D : document dted in the application 
L : document cited for other reasons 

A : reenbcr of the same patent family, corresponding 
document 



BNSDOCID; <EP 0698018A3J_> 



EP 0 696 01 8 A3 



EP 951 12163 - B- 




Office 



European Patent 



CLAIMS INCURRING FEES 



The present European patent appfication comprised at me time o( fiUng more than ten darns. 



All daims fees have been paid within the prescribed time limit. The present European search report has been 
drawn up for atl daims. 

Onty part of the daims tees have been paid within the prescribed time HmiL The present European search 
report has been drawn up tor the first ten daims and (or those daims for which daims tees have been paid, 

namely daims: 

No daims fees have been paid within the prescribed time KmiL The present European search report has been 
drawn up tor the first ten daims. 



The Search Division considers that the present European patent apptieation does not compfy with the requirement of unity of 

invention and relates to several inventions or groups of inventions. 

namely: 



1. Claims 1-13,30-41: 

A virtual space sharing apparatus and method aimed at generating a visual field 
image from position and direction of the eyes of an Avatar in the virtual space. 

2. Claims 14-22,42-50: 

A virtual space sharing apparatus and method aimed at sending and receiving 
speech data and to mix said speech data of each user and output it. 

3. Claims 23-29,51-57: 

A virtual space sharing apparatus and method aimed at sending and receiving 
video image data, to process said data and to produce output images based on 
said process. 

g All further search fees have been paid within the fixed time timit The present European search report has 



been drawn up tor atl claims. 

Only pan of the further search fees have been paid within »ie fixed time limit. The present European search 
report has been drawn up for those parts ot the European patent application which relate to the inventions « 
respects of wtiidi search tees have been paid. 

rwmely daims: 

None ol the further seardi tees has been pad within the (ixed time limit The present European search report 
has been drawn up lor those parts of the European patent application which relate to the invention first 
mentioned in the daims. 

namely daims: 



XI LACK OF UNITY OF INVENTION 



BNSDOCID: <EP l»98018A3 I > 

I. 



5 



HIS PAGE BLANK (uspto) 



